mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-09 00:16:21 +00:00
Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com>
415 lines
13 KiB
Plaintext
415 lines
13 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "raw",
|
|
"id": "df7d42b9-58a6-434c-a2d7-0b61142f6d3e",
|
|
"metadata": {},
|
|
"source": [
|
|
"---\n",
|
|
"sidebar_position: 3\n",
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f2195672-0cab-4967-ba8a-c6544635547d",
|
|
"metadata": {},
|
|
"source": [
|
|
"# How to handle cases where no queries are generated\n",
|
|
"\n",
|
|
"Sometimes, a query analysis technique may allow for any number of queries to be generated - including no queries! In this case, our overall chain will need to inspect the result of the query analysis before deciding whether to call the retriever or not.\n",
|
|
"\n",
|
|
"We will use mock data for this example."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a4079b57-4369-49c9-b2ad-c809b5408d7e",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"#### Install dependencies"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "e168ef5c-e54e-49a6-8552-5502854a6f01",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:33.121714Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:33.121392Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:36.998607Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:36.998126Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Note: you may need to restart the kernel to use updated packages.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%pip install -qU langchain langchain-community langchain-openai langchain-chroma"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "79d66a45-a05c-4d22-b011-b1cdbdfc8f9c",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Set environment variables\n",
|
|
"\n",
|
|
"We'll use OpenAI in this example:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "40e2979e-a818-4b96-ac25-039336f94319",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:37.001017Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:37.000859Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:37.003704Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:37.003335Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import getpass\n",
|
|
"import os\n",
|
|
"\n",
|
|
"if \"OPENAI_API_KEY\" not in os.environ:\n",
|
|
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
|
|
"\n",
|
|
"# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.\n",
|
|
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
|
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c20b48b8-16d7-4089-bc17-f2d240b3935a",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create Index\n",
|
|
"\n",
|
|
"We will create a vectorstore over fake information."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "1f621694",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:37.005644Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:37.005493Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:38.288481Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:38.287904Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_chroma import Chroma\n",
|
|
"from langchain_openai import OpenAIEmbeddings\n",
|
|
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
|
|
"\n",
|
|
"texts = [\"Harrison worked at Kensho\"]\n",
|
|
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
|
|
"vectorstore = Chroma.from_texts(\n",
|
|
" texts,\n",
|
|
" embeddings,\n",
|
|
")\n",
|
|
"retriever = vectorstore.as_retriever()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "57396e23-c192-4d97-846b-5eacea4d6b8d",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Query analysis\n",
|
|
"\n",
|
|
"We will use function calling to structure the output. However, we will configure the LLM such that is doesn't NEED to call the function representing a search query (should it decide not to). We will also then use a prompt to do query analysis that explicitly lays when it should and shouldn't make a search."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "0b51dd76-820d-41a4-98c8-893f6fe0d1ea",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:38.291700Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:38.291468Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:38.295796Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:38.295205Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import Optional\n",
|
|
"\n",
|
|
"from pydantic import BaseModel, Field\n",
|
|
"\n",
|
|
"\n",
|
|
"class Search(BaseModel):\n",
|
|
" \"\"\"Search over a database of job records.\"\"\"\n",
|
|
"\n",
|
|
" query: str = Field(\n",
|
|
" ...,\n",
|
|
" description=\"Similarity search query applied to job record.\",\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "783c03c3-8c72-4f88-9cf4-5829ce6745d6",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:38.297840Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:38.297712Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:38.420456Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:38.420140Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.prompts import ChatPromptTemplate\n",
|
|
"from langchain_core.runnables import RunnablePassthrough\n",
|
|
"from langchain_openai import ChatOpenAI\n",
|
|
"\n",
|
|
"system = \"\"\"You have the ability to issue search queries to get information to help answer user information.\n",
|
|
"\n",
|
|
"You do not NEED to look things up. If you don't need to, then just respond normally.\"\"\"\n",
|
|
"prompt = ChatPromptTemplate.from_messages(\n",
|
|
" [\n",
|
|
" (\"system\", system),\n",
|
|
" (\"human\", \"{question}\"),\n",
|
|
" ]\n",
|
|
")\n",
|
|
"llm = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)\n",
|
|
"structured_llm = llm.bind_tools([Search])\n",
|
|
"query_analyzer = {\"question\": RunnablePassthrough()} | prompt | structured_llm"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b9564078",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can see that by invoking this we get an message that sometimes - but not always - returns a tool call."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "bc1d3863",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:38.421934Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:38.421831Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:39.048915Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:39.048519Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_korLZrh08PTRL94f4L7rFqdj', 'function': {'arguments': '{\"query\":\"Harrison\"}', 'name': 'Search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 95, 'total_tokens': 109}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_483d39d857', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-ea94d376-37bf-4f80-abe6-e3b42b767ea0-0', tool_calls=[{'name': 'Search', 'args': {'query': 'Harrison'}, 'id': 'call_korLZrh08PTRL94f4L7rFqdj', 'type': 'tool_call'}], usage_metadata={'input_tokens': 95, 'output_tokens': 14, 'total_tokens': 109})"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"query_analyzer.invoke(\"where did Harrison Work\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "af62af17-4f90-4dbd-a8b4-dfff51f1db95",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:39.050923Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:39.050785Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:40.090421Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:40.089454Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 93, 'total_tokens': 103}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_483d39d857', 'finish_reason': 'stop', 'logprobs': None}, id='run-ebdfc44a-455a-4ca6-be85-84559886b1e1-0', usage_metadata={'input_tokens': 93, 'output_tokens': 10, 'total_tokens': 103})"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"query_analyzer.invoke(\"hi!\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c7c65b2f-7881-45fc-a47b-a4eaaf48245f",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Retrieval with query analysis\n",
|
|
"\n",
|
|
"So how would we include this in a chain? Let's look at an example below."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "1e047d87",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:40.093716Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:40.093472Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:40.097732Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:40.097274Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.output_parsers.openai_tools import PydanticToolsParser\n",
|
|
"from langchain_core.runnables import chain\n",
|
|
"\n",
|
|
"output_parser = PydanticToolsParser(tools=[Search])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "8dac7866",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:40.100028Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:40.099882Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:40.103105Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:40.102734Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"@chain\n",
|
|
"def custom_chain(question):\n",
|
|
" response = query_analyzer.invoke(question)\n",
|
|
" if \"tool_calls\" in response.additional_kwargs:\n",
|
|
" query = output_parser.invoke(response)\n",
|
|
" docs = retriever.invoke(query[0].query)\n",
|
|
" # Could add more logic - like another LLM call - here\n",
|
|
" return docs\n",
|
|
" else:\n",
|
|
" return response"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "232ad8a7-7990-4066-9228-d35a555f7293",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:40.105092Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:40.104917Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:41.341967Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:41.341455Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='Harrison worked at Kensho')]"
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"custom_chain.invoke(\"where did Harrison Work\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "28e14ba5",
|
|
"metadata": {
|
|
"execution": {
|
|
"iopub.execute_input": "2024-09-11T02:42:41.344639Z",
|
|
"iopub.status.busy": "2024-09-11T02:42:41.344411Z",
|
|
"iopub.status.idle": "2024-09-11T02:42:41.798332Z",
|
|
"shell.execute_reply": "2024-09-11T02:42:41.798054Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 93, 'total_tokens': 103}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_483d39d857', 'finish_reason': 'stop', 'logprobs': None}, id='run-e87f058d-30c0-4075-8a89-a01b982d557e-0', usage_metadata={'input_tokens': 93, 'output_tokens': 10, 'total_tokens': 103})"
|
|
]
|
|
},
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"custom_chain.invoke(\"hi!\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "33338d4f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.9"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|