langchain/docs/docs/how_to/query_no_queries.ipynb
Erick Friis c2a3021bb0
multiple: pydantic 2 compatibility, v0.3 (#26443)
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com>
Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com>
Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: ZhangShenao <15201440436@163.com>
Co-authored-by: Friso H. Kingma <fhkingma@gmail.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Morgante Pell <morgantep@google.com>
2024-09-13 14:38:45 -07:00

415 lines
13 KiB
Plaintext

{
"cells": [
{
"cell_type": "raw",
"id": "df7d42b9-58a6-434c-a2d7-0b61142f6d3e",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 3\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "f2195672-0cab-4967-ba8a-c6544635547d",
"metadata": {},
"source": [
"# How to handle cases where no queries are generated\n",
"\n",
"Sometimes, a query analysis technique may allow for any number of queries to be generated - including no queries! In this case, our overall chain will need to inspect the result of the query analysis before deciding whether to call the retriever or not.\n",
"\n",
"We will use mock data for this example."
]
},
{
"cell_type": "markdown",
"id": "a4079b57-4369-49c9-b2ad-c809b5408d7e",
"metadata": {},
"source": [
"## Setup\n",
"#### Install dependencies"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e168ef5c-e54e-49a6-8552-5502854a6f01",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:33.121714Z",
"iopub.status.busy": "2024-09-11T02:42:33.121392Z",
"iopub.status.idle": "2024-09-11T02:42:36.998607Z",
"shell.execute_reply": "2024-09-11T02:42:36.998126Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain langchain-community langchain-openai langchain-chroma"
]
},
{
"cell_type": "markdown",
"id": "79d66a45-a05c-4d22-b011-b1cdbdfc8f9c",
"metadata": {},
"source": [
"#### Set environment variables\n",
"\n",
"We'll use OpenAI in this example:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "40e2979e-a818-4b96-ac25-039336f94319",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:37.001017Z",
"iopub.status.busy": "2024-09-11T02:42:37.000859Z",
"iopub.status.idle": "2024-09-11T02:42:37.003704Z",
"shell.execute_reply": "2024-09-11T02:42:37.003335Z"
}
},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if \"OPENAI_API_KEY\" not in os.environ:\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
"\n",
"# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
]
},
{
"cell_type": "markdown",
"id": "c20b48b8-16d7-4089-bc17-f2d240b3935a",
"metadata": {},
"source": [
"### Create Index\n",
"\n",
"We will create a vectorstore over fake information."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1f621694",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:37.005644Z",
"iopub.status.busy": "2024-09-11T02:42:37.005493Z",
"iopub.status.idle": "2024-09-11T02:42:38.288481Z",
"shell.execute_reply": "2024-09-11T02:42:38.287904Z"
}
},
"outputs": [],
"source": [
"from langchain_chroma import Chroma\n",
"from langchain_openai import OpenAIEmbeddings\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
"\n",
"texts = [\"Harrison worked at Kensho\"]\n",
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
"vectorstore = Chroma.from_texts(\n",
" texts,\n",
" embeddings,\n",
")\n",
"retriever = vectorstore.as_retriever()"
]
},
{
"cell_type": "markdown",
"id": "57396e23-c192-4d97-846b-5eacea4d6b8d",
"metadata": {},
"source": [
"## Query analysis\n",
"\n",
"We will use function calling to structure the output. However, we will configure the LLM such that is doesn't NEED to call the function representing a search query (should it decide not to). We will also then use a prompt to do query analysis that explicitly lays when it should and shouldn't make a search."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0b51dd76-820d-41a4-98c8-893f6fe0d1ea",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:38.291700Z",
"iopub.status.busy": "2024-09-11T02:42:38.291468Z",
"iopub.status.idle": "2024-09-11T02:42:38.295796Z",
"shell.execute_reply": "2024-09-11T02:42:38.295205Z"
}
},
"outputs": [],
"source": [
"from typing import Optional\n",
"\n",
"from pydantic import BaseModel, Field\n",
"\n",
"\n",
"class Search(BaseModel):\n",
" \"\"\"Search over a database of job records.\"\"\"\n",
"\n",
" query: str = Field(\n",
" ...,\n",
" description=\"Similarity search query applied to job record.\",\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "783c03c3-8c72-4f88-9cf4-5829ce6745d6",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:38.297840Z",
"iopub.status.busy": "2024-09-11T02:42:38.297712Z",
"iopub.status.idle": "2024-09-11T02:42:38.420456Z",
"shell.execute_reply": "2024-09-11T02:42:38.420140Z"
}
},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"system = \"\"\"You have the ability to issue search queries to get information to help answer user information.\n",
"\n",
"You do not NEED to look things up. If you don't need to, then just respond normally.\"\"\"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", system),\n",
" (\"human\", \"{question}\"),\n",
" ]\n",
")\n",
"llm = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)\n",
"structured_llm = llm.bind_tools([Search])\n",
"query_analyzer = {\"question\": RunnablePassthrough()} | prompt | structured_llm"
]
},
{
"cell_type": "markdown",
"id": "b9564078",
"metadata": {},
"source": [
"We can see that by invoking this we get an message that sometimes - but not always - returns a tool call."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "bc1d3863",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:38.421934Z",
"iopub.status.busy": "2024-09-11T02:42:38.421831Z",
"iopub.status.idle": "2024-09-11T02:42:39.048915Z",
"shell.execute_reply": "2024-09-11T02:42:39.048519Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_korLZrh08PTRL94f4L7rFqdj', 'function': {'arguments': '{\"query\":\"Harrison\"}', 'name': 'Search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 95, 'total_tokens': 109}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_483d39d857', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-ea94d376-37bf-4f80-abe6-e3b42b767ea0-0', tool_calls=[{'name': 'Search', 'args': {'query': 'Harrison'}, 'id': 'call_korLZrh08PTRL94f4L7rFqdj', 'type': 'tool_call'}], usage_metadata={'input_tokens': 95, 'output_tokens': 14, 'total_tokens': 109})"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_analyzer.invoke(\"where did Harrison Work\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "af62af17-4f90-4dbd-a8b4-dfff51f1db95",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:39.050923Z",
"iopub.status.busy": "2024-09-11T02:42:39.050785Z",
"iopub.status.idle": "2024-09-11T02:42:40.090421Z",
"shell.execute_reply": "2024-09-11T02:42:40.089454Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 93, 'total_tokens': 103}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_483d39d857', 'finish_reason': 'stop', 'logprobs': None}, id='run-ebdfc44a-455a-4ca6-be85-84559886b1e1-0', usage_metadata={'input_tokens': 93, 'output_tokens': 10, 'total_tokens': 103})"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_analyzer.invoke(\"hi!\")"
]
},
{
"cell_type": "markdown",
"id": "c7c65b2f-7881-45fc-a47b-a4eaaf48245f",
"metadata": {},
"source": [
"## Retrieval with query analysis\n",
"\n",
"So how would we include this in a chain? Let's look at an example below."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1e047d87",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:40.093716Z",
"iopub.status.busy": "2024-09-11T02:42:40.093472Z",
"iopub.status.idle": "2024-09-11T02:42:40.097732Z",
"shell.execute_reply": "2024-09-11T02:42:40.097274Z"
}
},
"outputs": [],
"source": [
"from langchain_core.output_parsers.openai_tools import PydanticToolsParser\n",
"from langchain_core.runnables import chain\n",
"\n",
"output_parser = PydanticToolsParser(tools=[Search])"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8dac7866",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:40.100028Z",
"iopub.status.busy": "2024-09-11T02:42:40.099882Z",
"iopub.status.idle": "2024-09-11T02:42:40.103105Z",
"shell.execute_reply": "2024-09-11T02:42:40.102734Z"
}
},
"outputs": [],
"source": [
"@chain\n",
"def custom_chain(question):\n",
" response = query_analyzer.invoke(question)\n",
" if \"tool_calls\" in response.additional_kwargs:\n",
" query = output_parser.invoke(response)\n",
" docs = retriever.invoke(query[0].query)\n",
" # Could add more logic - like another LLM call - here\n",
" return docs\n",
" else:\n",
" return response"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "232ad8a7-7990-4066-9228-d35a555f7293",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:40.105092Z",
"iopub.status.busy": "2024-09-11T02:42:40.104917Z",
"iopub.status.idle": "2024-09-11T02:42:41.341967Z",
"shell.execute_reply": "2024-09-11T02:42:41.341455Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='Harrison worked at Kensho')]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"custom_chain.invoke(\"where did Harrison Work\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "28e14ba5",
"metadata": {
"execution": {
"iopub.execute_input": "2024-09-11T02:42:41.344639Z",
"iopub.status.busy": "2024-09-11T02:42:41.344411Z",
"iopub.status.idle": "2024-09-11T02:42:41.798332Z",
"shell.execute_reply": "2024-09-11T02:42:41.798054Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 93, 'total_tokens': 103}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_483d39d857', 'finish_reason': 'stop', 'logprobs': None}, id='run-e87f058d-30c0-4075-8a89-a01b982d557e-0', usage_metadata={'input_tokens': 93, 'output_tokens': 10, 'total_tokens': 103})"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"custom_chain.invoke(\"hi!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33338d4f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}