Files
langchain/docs/docs/how_to/query_multiple_queries.ipynb
2024-09-11 03:31:25 +00:00

408 lines
11 KiB
Plaintext

{
"cells": [
{
"cell_type": "raw",
"id": "df7d42b9-58a6-434c-a2d7-0b61142f6d3e",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 4\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "f2195672-0cab-4967-ba8a-c6544635547d",
"metadata": {},
"source": [
"# How to handle multiple queries when doing query analysis\n",
"\n",
"Sometimes, a query analysis technique may allow for multiple queries to be generated. In these cases, we need to remember to run all queries and then to combine the results. We will show a simple example (using mock data) of how to do that."
]
},
{
"cell_type": "markdown",
"id": "a4079b57-4369-49c9-b2ad-c809b5408d7e",
"metadata": {},
"source": [
"## Setup\n",
"#### Install dependencies"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e168ef5c-e54e-49a6-8552-5502854a6f01",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:41:53.160868Z",
"iopub.status.busy": "2024-09-11T02:41:53.160512Z",
"iopub.status.idle": "2024-09-11T02:41:57.605370Z",
"shell.execute_reply": "2024-09-11T02:41:57.604888Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain langchain-community langchain-openai langchain-chroma"
]
},
{
"cell_type": "markdown",
"id": "79d66a45-a05c-4d22-b011-b1cdbdfc8f9c",
"metadata": {},
"source": [
"#### Set environment variables\n",
"\n",
"We'll use OpenAI in this example:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "40e2979e-a818-4b96-ac25-039336f94319",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:41:57.607874Z",
"iopub.status.busy": "2024-09-11T02:41:57.607697Z",
"iopub.status.idle": "2024-09-11T02:41:57.610422Z",
"shell.execute_reply": "2024-09-11T02:41:57.610012Z"
}
},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if \"OPENAI_API_KEY\" not in os.environ:\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
"\n",
"# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
]
},
{
"cell_type": "markdown",
"id": "c20b48b8-16d7-4089-bc17-f2d240b3935a",
"metadata": {},
"source": [
"### Create Index\n",
"\n",
"We will create a vectorstore over fake information."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1f621694",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:41:57.612276Z",
"iopub.status.busy": "2024-09-11T02:41:57.612146Z",
"iopub.status.idle": "2024-09-11T02:41:59.074590Z",
"shell.execute_reply": "2024-09-11T02:41:59.074052Z"
}
},
"outputs": [],
"source": [
"from langchain_chroma import Chroma\n",
"from langchain_openai import OpenAIEmbeddings\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
"\n",
"texts = [\"Harrison worked at Kensho\", \"Ankush worked at Facebook\"]\n",
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
"vectorstore = Chroma.from_texts(\n",
" texts,\n",
" embeddings,\n",
")\n",
"retriever = vectorstore.as_retriever(search_kwargs={\"k\": 1})"
]
},
{
"cell_type": "markdown",
"id": "57396e23-c192-4d97-846b-5eacea4d6b8d",
"metadata": {},
"source": [
"## Query analysis\n",
"\n",
"We will use function calling to structure the output. We will let it return multiple queries."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0b51dd76-820d-41a4-98c8-893f6fe0d1ea",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:41:59.077712Z",
"iopub.status.busy": "2024-09-11T02:41:59.077514Z",
"iopub.status.idle": "2024-09-11T02:41:59.081509Z",
"shell.execute_reply": "2024-09-11T02:41:59.081112Z"
}
},
"outputs": [],
"source": [
"from typing import List, Optional\n",
"\n",
"from pydantic import BaseModel, Field\n",
"\n",
"\n",
"class Search(BaseModel):\n",
" \"\"\"Search over a database of job records.\"\"\"\n",
"\n",
" queries: List[str] = Field(\n",
" ...,\n",
" description=\"Distinct queries to search for\",\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "783c03c3-8c72-4f88-9cf4-5829ce6745d6",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:41:59.083613Z",
"iopub.status.busy": "2024-09-11T02:41:59.083492Z",
"iopub.status.idle": "2024-09-11T02:41:59.204636Z",
"shell.execute_reply": "2024-09-11T02:41:59.204377Z"
}
},
"outputs": [],
"source": [
"from langchain_core.output_parsers.openai_tools import PydanticToolsParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"output_parser = PydanticToolsParser(tools=[Search])\n",
"\n",
"system = \"\"\"You have the ability to issue search queries to get information to help answer user information.\n",
"\n",
"If you need to look up two distinct pieces of information, you are allowed to do that!\"\"\"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", system),\n",
" (\"human\", \"{question}\"),\n",
" ]\n",
")\n",
"llm = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)\n",
"structured_llm = llm.with_structured_output(Search)\n",
"query_analyzer = {\"question\": RunnablePassthrough()} | prompt | structured_llm"
]
},
{
"cell_type": "markdown",
"id": "b9564078",
"metadata": {},
"source": [
"We can see that this allows for creating multiple queries"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "bc1d3863",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:41:59.206178Z",
"iopub.status.busy": "2024-09-11T02:41:59.206101Z",
"iopub.status.idle": "2024-09-11T02:41:59.817758Z",
"shell.execute_reply": "2024-09-11T02:41:59.817310Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Search(queries=['Harrison Work', 'Harrison employment history'])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_analyzer.invoke(\"where did Harrison Work\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "af62af17-4f90-4dbd-a8b4-dfff51f1db95",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:41:59.820168Z",
"iopub.status.busy": "2024-09-11T02:41:59.819990Z",
"iopub.status.idle": "2024-09-11T02:42:00.309034Z",
"shell.execute_reply": "2024-09-11T02:42:00.308578Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Search(queries=['Harrison work history', 'Ankush work history'])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_analyzer.invoke(\"where did Harrison and ankush Work\")"
]
},
{
"cell_type": "markdown",
"id": "c7c65b2f-7881-45fc-a47b-a4eaaf48245f",
"metadata": {},
"source": [
"## Retrieval with query analysis\n",
"\n",
"So how would we include this in a chain? One thing that will make this a lot easier is if we call our retriever asyncronously - this will let us loop over the queries and not get blocked on the response time."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1e047d87",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:00.311131Z",
"iopub.status.busy": "2024-09-11T02:42:00.310972Z",
"iopub.status.idle": "2024-09-11T02:42:00.313365Z",
"shell.execute_reply": "2024-09-11T02:42:00.313025Z"
}
},
"outputs": [],
"source": [
"from langchain_core.runnables import chain"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8dac7866",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:00.315138Z",
"iopub.status.busy": "2024-09-11T02:42:00.315016Z",
"iopub.status.idle": "2024-09-11T02:42:00.317427Z",
"shell.execute_reply": "2024-09-11T02:42:00.317088Z"
}
},
"outputs": [],
"source": [
"@chain\n",
"async def custom_chain(question):\n",
" response = await query_analyzer.ainvoke(question)\n",
" docs = []\n",
" for query in response.queries:\n",
" new_docs = await retriever.ainvoke(query)\n",
" docs.extend(new_docs)\n",
" # You probably want to think about reranking or deduplicating documents here\n",
" # But that is a separate topic\n",
" return docs"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "232ad8a7-7990-4066-9228-d35a555f7293",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:00.318951Z",
"iopub.status.busy": "2024-09-11T02:42:00.318829Z",
"iopub.status.idle": "2024-09-11T02:42:01.512855Z",
"shell.execute_reply": "2024-09-11T02:42:01.512321Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Harrison worked at Kensho'),\n",
" Document(page_content='Harrison worked at Kensho')]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"await custom_chain.ainvoke(\"where did Harrison Work\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "28e14ba5",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:01.515743Z",
"iopub.status.busy": "2024-09-11T02:42:01.515400Z",
"iopub.status.idle": "2024-09-11T02:42:02.349930Z",
"shell.execute_reply": "2024-09-11T02:42:02.349382Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Harrison worked at Kensho'),\n",
" Document(page_content='Ankush worked at Facebook')]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"await custom_chain.ainvoke(\"where did Harrison and ankush Work\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "88de5a36",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}