mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-18 04:25:22 +00:00
Compare commits
1 Commits
langchain-
...
eugene/fix
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4bba9d3e31 |
@@ -52,7 +52,7 @@ Now:
|
||||
|
||||
`from langchain_experimental.sql import SQLDatabaseChain`
|
||||
|
||||
Alternatively, if you are just interested in using the query generation part of the SQL chain, you can check out this [`SQL question-answering tutorial`](https://python.langchain.com/v0.2/docs/tutorials/sql_qa/#convert-question-to-sql-query)
|
||||
Alternatively, if you are just interested in using the query generation part of the SQL chain, you can check out [`create_sql_query_chain`](https://github.com/langchain-ai/langchain/blob/master/docs/extras/use_cases/tabular/sql_query.ipynb)
|
||||
|
||||
`from langchain.chains import create_sql_query_chain`
|
||||
|
||||
|
||||
@@ -39,7 +39,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain langchain-chroma \"unstructured[all-docs]\" pydantic lxml langchainhub"
|
||||
"! pip install langchain langchain-chroma unstructured[all-docs] pydantic lxml langchainhub"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -59,7 +59,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain langchain-chroma \"unstructured[all-docs]\" pydantic lxml"
|
||||
"! pip install langchain langchain-chroma unstructured[all-docs] pydantic lxml"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -59,7 +59,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain langchain-chroma \"unstructured[all-docs]\" pydantic lxml"
|
||||
"! pip install langchain langchain-chroma unstructured[all-docs] pydantic lxml"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -166,7 +166,7 @@
|
||||
"source": [
|
||||
"### SQL Database Agent example\n",
|
||||
"\n",
|
||||
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/tools/sql_database) for answering questions over a Databricks database."
|
||||
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/toolkits/sql_database.html) for answering questions over a Databricks database."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -13,12 +13,7 @@ OUTPUT_NEW_DOCS_DIR = $(OUTPUT_NEW_DIR)/docs
|
||||
|
||||
PYTHON = .venv/bin/python
|
||||
|
||||
PARTNER_DEPS_LIST := $(shell find ../libs/partners -mindepth 1 -maxdepth 1 -type d -exec sh -c ' \
|
||||
for dir; do \
|
||||
if find "$$dir" -maxdepth 1 -type f \( -name "pyproject.toml" -o -name "setup.py" \) | grep -q .; then \
|
||||
echo "$$dir"; \
|
||||
fi \
|
||||
done' sh {} + | grep -vE "airbyte|ibm|couchbase" | tr '\n' ' ')
|
||||
PARTNER_DEPS_LIST := $(shell find ../libs/partners -mindepth 1 -maxdepth 1 -type d -exec test -e "{}/pyproject.toml" \; -print | grep -vE "airbyte|ibm|couchbase" | tr '\n' ' ')
|
||||
|
||||
PORT ?= 3001
|
||||
|
||||
|
||||
@@ -28,7 +28,7 @@
|
||||
"\n",
|
||||
"You can use arbitrary functions as [Runnables](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable). This is useful for formatting or when you need functionality not provided by other LangChain components, and custom functions used as Runnables are called [`RunnableLambdas`](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.RunnableLambda.html).\n",
|
||||
"\n",
|
||||
"Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple arguments, you should write a wrapper that accepts a single dict input and unpacks it into multiple arguments.\n",
|
||||
"Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple arguments, you should write a wrapper that accepts a single dict input and unpacks it into multiple argument.\n",
|
||||
"\n",
|
||||
"This guide will cover:\n",
|
||||
"\n",
|
||||
|
||||
@@ -13,14 +13,9 @@ the v1 namespace of Pydantic 2.
|
||||
Because Pydantic does not support mixing .v1 and .v2 objects, users should be aware of a number of issues
|
||||
when using LangChain with Pydantic.
|
||||
|
||||
:::caution
|
||||
While LangChain supports Pydantic V2 objects in some APIs (listed below), it's suggested that users keep using Pydantic V1 objects until LangChain 0.3 is released.
|
||||
:::
|
||||
|
||||
|
||||
## 1. Passing Pydantic objects to LangChain APIs
|
||||
|
||||
Most LangChain APIs for *tool usage* (see list below) have been updated to accept either Pydantic v1 or v2 objects.
|
||||
Most LangChain APIs that accept Pydantic objects have been updated to accept both Pydantic v1 and v2 objects.
|
||||
|
||||
* Pydantic v1 objects correspond to subclasses of `pydantic.BaseModel` if `pydantic 1` is installed or subclasses of `pydantic.v1.BaseModel` if `pydantic 2` is installed.
|
||||
* Pydantic v2 objects correspond to subclasses of `pydantic.BaseModel` if `pydantic 2` is installed.
|
||||
|
||||
@@ -721,9 +721,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.checkpoint.memory import MemorySaver\n",
|
||||
"from langgraph.checkpoint.sqlite import SqliteSaver\n",
|
||||
"\n",
|
||||
"memory = MemorySaver()\n",
|
||||
"memory = SqliteSaver.from_conn_string(\":memory:\")\n",
|
||||
"\n",
|
||||
"agent_executor = create_react_agent(llm, tools, checkpointer=memory)"
|
||||
]
|
||||
@@ -890,9 +890,9 @@
|
||||
"from langchain_community.document_loaders import WebBaseLoader\n",
|
||||
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
|
||||
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
|
||||
"from langgraph.checkpoint.memory import MemorySaver\n",
|
||||
"from langgraph.checkpoint.sqlite import SqliteSaver\n",
|
||||
"\n",
|
||||
"memory = MemorySaver()\n",
|
||||
"memory = SqliteSaver.from_conn_string(\":memory:\")\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
|
||||
@@ -14,9 +14,7 @@
|
||||
"We will cover two approaches:\n",
|
||||
"\n",
|
||||
"1. Using the built-in [create_retrieval_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html), which returns sources by default;\n",
|
||||
"2. Using a simple [LCEL](/docs/concepts#langchain-expression-language-lcel) implementation, to show the operating principle.\n",
|
||||
"\n",
|
||||
"We will also show how to structure sources into the model response, such that a model can report what specific sources it used in generating its answer."
|
||||
"2. Using a simple [LCEL](/docs/concepts#langchain-expression-language-lcel) implementation, to show the operating principle."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -132,8 +130,8 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "24a69b8c-024e-4e34-b827-9c9de46512a3",
|
||||
"execution_count": 3,
|
||||
"id": "820244ae-74b4-4593-b392-822979dd91b8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -213,11 +211,11 @@
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'input': 'What is Task Decomposition?',\n",
|
||||
" 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'),\n",
|
||||
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.'),\n",
|
||||
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Resources:\\n1. Internet access for searches and information gathering.\\n2. Long Term memory management.\\n3. GPT-3.5 powered Agents for delegation of simple tasks.\\n4. File output.\\n\\nPerformance Evaluation:\\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\\n2. Constructively self-criticize your big-picture behavior constantly.\\n3. Reflect on past decisions and strategies to refine your approach.\\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.'),\n",
|
||||
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content=\"(3) Task execution: Expert models execute on the specific tasks and log results.\\nInstruction:\\n\\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\")],\n",
|
||||
" 'answer': 'Task decomposition involves breaking down a complex task into smaller and more manageable steps. This process helps agents or models tackle difficult tasks by dividing them into simpler subtasks or components. Task decomposition can be achieved through techniques like Chain of Thought or Tree of Thoughts, which guide the agent in breaking down tasks into sequential or branching steps.'}"
|
||||
" 'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
|
||||
" Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
|
||||
" Document(page_content='Resources:\\n1. Internet access for searches and information gathering.\\n2. Long Term memory management.\\n3. GPT-3.5 powered Agents for delegation of simple tasks.\\n4. File output.\\n\\nPerformance Evaluation:\\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\\n2. Constructively self-criticize your big-picture behavior constantly.\\n3. Reflect on past decisions and strategies to refine your approach.\\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
|
||||
" Document(page_content=\"(3) Task execution: Expert models execute on the specific tasks and log results.\\nInstruction:\\n\\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\", metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})],\n",
|
||||
" 'answer': 'Task decomposition involves breaking down a complex task into smaller and simpler steps. This process helps agents or models handle challenging tasks by dividing them into more manageable subtasks. Techniques like Chain of Thought and Tree of Thoughts are used to decompose tasks into multiple steps for better problem-solving.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
@@ -253,18 +251,18 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "1950953a-e6f1-439d-b7b9-c3bd456e388d",
|
||||
"id": "22ea137c-1a7a-44dd-ac73-281213979957",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'input': 'What is Task Decomposition',\n",
|
||||
" 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'),\n",
|
||||
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.'),\n",
|
||||
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='The AI assistant can parse user input to several tasks: [{\"task\": task, \"id\", task_id, \"dep\": dependency_task_ids, \"args\": {\"text\": text, \"image\": URL, \"audio\": URL, \"video\": URL}}]. The \"dep\" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag \"-task_id\" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.'),\n",
|
||||
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\\nThe system comprises of 4 stages:\\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\\nInstruction:')],\n",
|
||||
" 'answer': 'Task decomposition is a technique used in artificial intelligence to break down complex tasks into smaller and more manageable subtasks. This approach helps agents or models to tackle difficult problems by dividing them into simpler steps, improving performance and interpretability. Different methods like Chain of Thought and Tree of Thoughts have been developed to enhance task decomposition in AI systems.'}"
|
||||
" 'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
|
||||
" Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
|
||||
" Document(page_content='The AI assistant can parse user input to several tasks: [{\"task\": task, \"id\", task_id, \"dep\": dependency_task_ids, \"args\": {\"text\": text, \"image\": URL, \"audio\": URL, \"video\": URL}}]. The \"dep\" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag \"-task_id\" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
|
||||
" Document(page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\\nThe system comprises of 4 stages:\\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\\nInstruction:', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})],\n",
|
||||
" 'answer': 'Task decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable for autonomous agents or models. This process can be achieved by techniques like Chain of Thought (CoT) or Tree of Thoughts, which guide the model to think step by step or explore multiple reasoning possibilities at each step. Task decomposition can be done through simple prompting with language models, task-specific instructions, or human inputs.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
@@ -281,25 +279,15 @@
|
||||
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# This Runnable takes a dict with keys 'input' and 'context',\n",
|
||||
"# formats them into a prompt, and generates a response.\n",
|
||||
"rag_chain_from_docs = (\n",
|
||||
" {\n",
|
||||
" \"input\": lambda x: x[\"input\"], # input query\n",
|
||||
" \"context\": lambda x: format_docs(x[\"context\"]), # context\n",
|
||||
" }\n",
|
||||
" | prompt # format query and context into prompt\n",
|
||||
" | llm # generate response\n",
|
||||
" | StrOutputParser() # coerce to string\n",
|
||||
" RunnablePassthrough.assign(context=(lambda x: format_docs(x[\"context\"])))\n",
|
||||
" | prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser()\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Pass input query to retriever\n",
|
||||
"retrieve_docs = (lambda x: x[\"input\"]) | retriever\n",
|
||||
"\n",
|
||||
"# Below, we chain `.assign` calls. This takes a dict and successively\n",
|
||||
"# adds keys-- \"context\" and \"answer\"-- where the value for each key\n",
|
||||
"# is determined by a Runnable. The Runnable operates on all existing\n",
|
||||
"# keys in the dict.\n",
|
||||
"chain = RunnablePassthrough.assign(context=retrieve_docs).assign(\n",
|
||||
" answer=rag_chain_from_docs\n",
|
||||
")\n",
|
||||
@@ -314,105 +302,7 @@
|
||||
"source": [
|
||||
":::{.callout-tip}\n",
|
||||
"\n",
|
||||
"Check out the [LangSmith trace](https://smith.langchain.com/public/1c055a3b-0236-4670-a3fb-023d418ba796/r)\n",
|
||||
"\n",
|
||||
":::"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c1c17797-d965-4fd2-b8d4-d386f25dd352",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Structure sources in model response\n",
|
||||
"\n",
|
||||
"Up to this point, we've simply propagated the documents returned from the retrieval step through to the final response. But this may not illustrate what subset of information the model relied on when generating its answer. Below, we show how to structure sources into the model response, allowing the model to report what specific context it relied on for its answer.\n",
|
||||
"\n",
|
||||
"Because the above LCEL implementation is composed of [Runnable](/docs/concepts/#runnable-interface) primitives, it is straightforward to extend. Below, we make a simple change:\n",
|
||||
"\n",
|
||||
"- We use the model's tool-calling features to generate [structured output](/docs/how_to/structured_output/), consisting of an answer and list of sources. The schema for the response is represented in the `AnswerWithSources` TypedDict, below.\n",
|
||||
"- We remove the `StrOutputParser()`, as we expect `dict` output in this scenario."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "8f916b14-1b0a-4975-a62f-52f1353bde15",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from typing_extensions import Annotated, TypedDict\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Desired schema for response\n",
|
||||
"class AnswerWithSources(TypedDict):\n",
|
||||
" \"\"\"An answer to the question, with sources.\"\"\"\n",
|
||||
"\n",
|
||||
" answer: str\n",
|
||||
" sources: Annotated[\n",
|
||||
" List[str],\n",
|
||||
" ...,\n",
|
||||
" \"List of sources (author + year) used to answer the question\",\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Our rag_chain_from_docs has the following changes:\n",
|
||||
"# - add `.with_structured_output` to the LLM;\n",
|
||||
"# - remove the output parser\n",
|
||||
"rag_chain_from_docs = (\n",
|
||||
" {\n",
|
||||
" \"input\": lambda x: x[\"input\"],\n",
|
||||
" \"context\": lambda x: format_docs(x[\"context\"]),\n",
|
||||
" }\n",
|
||||
" | prompt\n",
|
||||
" | llm.with_structured_output(AnswerWithSources)\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"retrieve_docs = (lambda x: x[\"input\"]) | retriever\n",
|
||||
"\n",
|
||||
"chain = RunnablePassthrough.assign(context=retrieve_docs).assign(\n",
|
||||
" answer=rag_chain_from_docs\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"response = chain.invoke({\"input\": \"What is Chain of Thought?\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "7a8fc0c5-afb3-4012-a467-3951996a6850",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{\n",
|
||||
" \"answer\": \"Chain of Thought (CoT) is a prompting technique that enhances model performance on complex tasks by instructing the model to \\\"think step by step\\\" to decompose hard tasks into smaller and simpler steps. It transforms big tasks into multiple manageable tasks and sheds light on the interpretation of the model's thinking process.\",\n",
|
||||
" \"sources\": [\n",
|
||||
" \"Wei et al. 2022\"\n",
|
||||
" ]\n",
|
||||
"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"\n",
|
||||
"print(json.dumps(response[\"answer\"], indent=2))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7440f785-29c5-4c6b-9656-0d9d5efbac05",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
":::{.callout-tip}\n",
|
||||
"\n",
|
||||
"View [LangSmith trace](https://smith.langchain.com/public/0eeddf06-3a7b-4f27-974c-310ca8160f60/r)\n",
|
||||
"Check out the [LangSmith trace](https://smith.langchain.com/public/0cb42685-e29e-4280-a503-bef2014d7ba2/r)\n",
|
||||
"\n",
|
||||
":::"
|
||||
]
|
||||
|
||||
@@ -761,7 +761,7 @@
|
||||
"* [SQL tutorial](/docs/tutorials/sql_qa): Many of the challenges of working with SQL db's and CSV's are generic to any structured data type, so it's useful to read the SQL techniques even if you're using Pandas for CSV data analysis.\n",
|
||||
"* [Tool use](/docs/how_to/tool_calling): Guides on general best practices when working with chains and agents that invoke tools\n",
|
||||
"* [Agents](/docs/tutorials/agents): Understand the fundamentals of building LLM agents.\n",
|
||||
"* Integrations: Sandboxed envs like [E2B](/docs/integrations/tools/e2b_data_analysis) and [Bearly](/docs/integrations/tools/bearly), utilities like [SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), related agents like [Spark DataFrame agent](/docs/integrations/tools/spark_sql)."
|
||||
"* Integrations: Sandboxed envs like [E2B](/docs/integrations/tools/e2b_data_analysis) and [Bearly](/docs/integrations/tools/bearly), utilities like [SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), related agents like [Spark DataFrame agent](/docs/integrations/toolkits/spark)."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -5,6 +5,7 @@ sidebar_position: 3
|
||||
|
||||
|
||||
Toolkits are collections of tools that are designed to be used together for specific tasks. They have convenient loading methods.
|
||||
For a complete list of available ready-made toolkits, visit [Integrations](/docs/integrations/toolkits/).
|
||||
|
||||
All Toolkits expose a `get_tools` method which returns a list of tools.
|
||||
You can therefore do:
|
||||
|
||||
@@ -196,6 +196,8 @@
|
||||
"\n",
|
||||
"Toolkits are collections of tools that are designed to be used together for specific tasks. They have convenient loading methods.\n",
|
||||
"\n",
|
||||
"For a complete list of available ready-made toolkits, visit [Integrations](/docs/integrations/toolkits/).\n",
|
||||
"\n",
|
||||
"All Toolkits expose a `get_tools` method which returns a list of tools.\n",
|
||||
"\n",
|
||||
"You're usually meant to use them this way:\n",
|
||||
|
||||
@@ -13,7 +13,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Kinetica Language To SQL Chat Model\n",
|
||||
"# Kinetica SqlAssist LLM Demo\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use Kinetica to transform natural language into SQL\n",
|
||||
"and simplify the process of data retrieval. This demo is intended to show the mechanics\n",
|
||||
|
||||
@@ -56,16 +56,23 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 2,
|
||||
"id": "e817fe2e-4f1d-4533-b19e-2400b1cf6ce8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Enter your OpenAI API key: ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"if not os.environ.get(\"OPENAI_API_KEY\"):\n",
|
||||
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")"
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -119,7 +126,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"id": "522686de",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -274,12 +281,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 9,
|
||||
"id": "b7ea7690-ec7a-4337-b392-e87d1f39a6ec",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class GetWeather(BaseModel):\n",
|
||||
@@ -315,47 +322,6 @@
|
||||
"ai_msg"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "67b0f63d-15e6-45e0-9e86-2852ddcff54f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### ``strict=True``\n",
|
||||
"\n",
|
||||
":::info Requires ``langchain-openai>=0.1.21rc1``\n",
|
||||
"\n",
|
||||
":::\n",
|
||||
"\n",
|
||||
"As of Aug 6, 2024, OpenAI supports a `strict` argument when calling tools that will enforce that the tool argument schema is respected by the model. See more here: https://platform.openai.com/docs/guides/function-calling\n",
|
||||
"\n",
|
||||
"**Note**: If ``strict=True`` the tool definition will also be validated, and a subset of JSON schema are accepted. Crucially, schema cannot have optional args (those with default values). Read the full docs on what types of schema are supported here: https://platform.openai.com/docs/guides/structured-outputs/supported-schemas. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "dc8ac4f1-4039-4392-90c1-2d8331cd6910",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_VYEfpPDh3npMQ95J9EWmWvSn', 'function': {'arguments': '{\"location\":\"San Francisco, CA\"}', 'name': 'GetWeather'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 68, 'total_tokens': 85}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-a4c6749b-adbb-45c7-8b17-8d6835d5c443-0', tool_calls=[{'name': 'GetWeather', 'args': {'location': 'San Francisco, CA'}, 'id': 'call_VYEfpPDh3npMQ95J9EWmWvSn', 'type': 'tool_call'}], usage_metadata={'input_tokens': 68, 'output_tokens': 17, 'total_tokens': 85})"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_with_tools = llm.bind_tools([GetWeather], strict=True)\n",
|
||||
"ai_msg = llm_with_tools.invoke(\n",
|
||||
" \"what is the weather like in San Francisco\",\n",
|
||||
")\n",
|
||||
"ai_msg"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "768d1ae4-4b1a-48eb-a329-c8d5051067a3",
|
||||
@@ -446,9 +412,9 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "poetry-venv-311",
|
||||
"display_name": "poetry-venv-2",
|
||||
"language": "python",
|
||||
"name": "poetry-venv-311"
|
||||
"name": "poetry-venv-2"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
|
||||
@@ -13,7 +13,7 @@
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"You need to have an existing dataset on the Apify platform. This example shows how to load a dataset produced by the [Website Content Crawler](https://apify.com/apify/website-content-crawler)."
|
||||
"You need to have an existing dataset on the Apify platform. If you don't have one, please first check out [this notebook](/docs/integrations/tools/apify) on how to use Apify to extract content from documentation, knowledge bases, help centers, or blogs. This example shows how to load a dataset produced by the [Website Content Crawler](https://apify.com/apify/website-content-crawler)."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -1,182 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PyPDFLoader\n",
|
||||
"\n",
|
||||
"This notebook provides a quick overview for getting started with `PyPDF` [document loader](https://python.langchain.com/v0.2/docs/concepts/#document-loaders). For detailed documentation of all __ModuleName__Loader features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFLoader.html).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"### Integration details\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"| Class | Package | Local | Serializable | JS support|\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: |\n",
|
||||
"| [PyPDFLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ❌ | \n",
|
||||
"### Loader features\n",
|
||||
"| Source | Document Lazy Loading | Native Async Support\n",
|
||||
"| :---: | :---: | :---: | \n",
|
||||
"| PyPDFLoader | ✅ | ❌ | \n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"No credentials are required to use `PyPDFLoader`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"To use `PyPDFLoader` you need to have the `langchain-community` python package downloaded:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain_community"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"Now we can instantiate our model object and load documents:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import PyPDFLoader\n",
|
||||
"\n",
|
||||
"loader = PyPDFLoader(\n",
|
||||
" \"./example_data/layout-parser-paper.pdf\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'page': 0}, page_content='LayoutParser : A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1( \\x00), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1Allen Institute for AI\\nshannons@allenai.org\\n2Brown University\\nruochen zhang@brown.edu\\n3Harvard University\\n{melissadell,jacob carlson }@fas.harvard.edu\\n4University of Washington\\nbcgl@cs.washington.edu\\n5University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser , an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io .\\nKeywords: Document Image Analysis ·Deep Learning ·Layout Analysis\\n·Character Recognition ·Open Source library ·Toolkit.\\n1 Introduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [ 11,arXiv:2103.15348v2 [cs.CV] 21 Jun 2021')"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs = loader.load()\n",
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'source': './example_data/layout-parser-paper.pdf', 'page': 0}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Lazy Load\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"6"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"page = []\n",
|
||||
"for doc in loader.lazy_load():\n",
|
||||
" page.append(doc)\n",
|
||||
" if len(page) >= 10:\n",
|
||||
" # do some paged operation, e.g.\n",
|
||||
" # index.upsert(page)\n",
|
||||
"\n",
|
||||
" page = []\n",
|
||||
"len(page)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `PyPDFLoader` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFLoader.html"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -7,18 +7,7 @@
|
||||
"source": [
|
||||
"# Recursive URL\n",
|
||||
"\n",
|
||||
"The `RecursiveUrlLoader` lets you recursively scrape all child links from a root URL and parse them into Documents.\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"### Integration details\n",
|
||||
"\n",
|
||||
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/document_loaders/web_loaders/recursive_url_loader/)|\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: |\n",
|
||||
"| [RecursiveUrlLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.recursive_url_loader.RecursiveUrlLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ✅ | \n",
|
||||
"### Loader features\n",
|
||||
"| Source | Document Lazy Loading | Native Async Support\n",
|
||||
"| :---: | :---: | :---: | \n",
|
||||
"| RecursiveUrlLoader | ✅ | ❌ | \n"
|
||||
"The `RecursiveUrlLoader` lets you recursively scrape all child links from a root URL and parse them into Documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -28,12 +17,6 @@
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"No credentials are required to use the `RecursiveUrlLoader`.\n",
|
||||
"\n",
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"The `RecursiveUrlLoader` lives in the `langchain-community` package. There's no other required packages, though you will get richer default Document metadata if you have ``beautifulsoup4` installed as well."
|
||||
]
|
||||
},
|
||||
@@ -203,50 +186,6 @@
|
||||
"That certainly looks like HTML that comes from the url https://docs.python.org/3.9/, which is what we expected. Let's now look at some variations we can make to our basic example that can be helpful in different situations. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b17b7202",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Lazy loading\n",
|
||||
"\n",
|
||||
"If we're loading a large number of Documents and our downstream operations can be done over subsets of all loaded Documents, we can lazily load our Documents one at a time to minimize our memory footprint:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4b13e4d1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/var/folders/4j/2rz3865x6qg07tx43146py8h0000gn/T/ipykernel_73962/2110507528.py:6: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features=\"xml\"` into the BeautifulSoup constructor.\n",
|
||||
" soup = BeautifulSoup(html, \"lxml\")\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"page = []\n",
|
||||
"for doc in loader.lazy_load():\n",
|
||||
" page.append(doc)\n",
|
||||
" if len(page) >= 10:\n",
|
||||
" # do some paged operation, e.g.\n",
|
||||
" # index.upsert(page)\n",
|
||||
"\n",
|
||||
" page = []"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fb039682",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this example we never have more than 10 Documents loaded into memory at a time."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8f41cc89",
|
||||
@@ -317,6 +256,50 @@
|
||||
"You can similarly pass in a `metadata_extractor` to customize how Document metadata is extracted from the HTTP response. See the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.recursive_url_loader.RecursiveUrlLoader.html) for more on this."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1dddbc94",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Lazy loading\n",
|
||||
"\n",
|
||||
"If we're loading a large number of Documents and our downstream operations can be done over subsets of all loaded Documents, we can lazily load our Documents one at a time to minimize our memory footprint:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "7d0114fc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/var/folders/4j/2rz3865x6qg07tx43146py8h0000gn/T/ipykernel_73962/2110507528.py:6: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features=\"xml\"` into the BeautifulSoup constructor.\n",
|
||||
" soup = BeautifulSoup(html, \"lxml\")\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"page = []\n",
|
||||
"for doc in loader.lazy_load():\n",
|
||||
" page.append(doc)\n",
|
||||
" if len(page) >= 10:\n",
|
||||
" # do some paged operation, e.g.\n",
|
||||
" # index.upsert(page)\n",
|
||||
"\n",
|
||||
" page = []"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f88a7c2f-35df-4c3a-b238-f91be2674b96",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this example we never have more than 10 Documents loaded into memory at a time."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3e4d1c8f",
|
||||
|
||||
@@ -7,41 +7,20 @@
|
||||
"source": [
|
||||
"# Unstructured\n",
|
||||
"\n",
|
||||
"This notebook covers how to use `Unstructured` [document loader](https://python.langchain.com/v0.2/docs/concepts/#document-loaders) to load files of many types. `Unstructured` currently supports loading of text files, powerpoints, html, pdfs, images, and more.\n",
|
||||
"This notebook covers how to use `Unstructured` package to load files of many types. `Unstructured` currently supports loading of text files, powerpoints, html, pdfs, images, and more.\n",
|
||||
"\n",
|
||||
"Please see [this guide](../../integrations/providers/unstructured.mdx) for more instructions on setting up Unstructured locally, including setting up required system dependencies.\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"### Integration details\n",
|
||||
"\n",
|
||||
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/document_loaders/file_loaders/unstructured/)|\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: |\n",
|
||||
"| [UnstructuredLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_unstructured.document_loaders.UnstructuredLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/unstructured_api_reference.html) | ✅ | ❌ | ✅ | \n",
|
||||
"### Loader features\n",
|
||||
"| Source | Document Lazy Loading | Native Async Support\n",
|
||||
"| :---: | :---: | :---: | \n",
|
||||
"| UnstructuredLoader | ✅ | ❌ | \n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"By default, `langchain-unstructured` installs a smaller footprint that requires offloading of the partitioning logic to the Unstructured API, which requires an API key. If you use the local installation, you do not need an API key. To get your API key, head over to [this site](https://unstructured.io) and get an API key, and then set it in the cell below:"
|
||||
"Please see [this guide](/docs/integrations/providers/unstructured/) for more instructions on setting up Unstructured locally, including setting up required system dependencies."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"id": "2886982e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"UNSTRUCTURED_API_KEY\"] = getpass.getpass(\n",
|
||||
" \"Enter your Unstructured API key: \"\n",
|
||||
")"
|
||||
"# Install package, compatible with API partitioning\n",
|
||||
"%pip install --upgrade --quiet \"langchain-unstructured\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -49,32 +28,15 @@
|
||||
"id": "e75e2a6d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"### Local Partitioning (Optional)\n",
|
||||
"\n",
|
||||
"#### Normal Installation\n",
|
||||
"By default, `langchain-unstructured` installs a smaller footprint that requires\n",
|
||||
"offloading of the partitioning logic to the Unstructured API, which requires an `api_key`. For\n",
|
||||
"partitioning using the API, refer to the Unstructured API section below.\n",
|
||||
"\n",
|
||||
"The following packages are required to run the rest of this notebook."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d9de83b3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Install package, compatible with API partitioning\n",
|
||||
"%pip install --upgrade --quiet langchain-unstructured unstructured-client unstructured \"unstructured[pdf]\" python-magic"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "637eda35",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Installation for Local\n",
|
||||
"\n",
|
||||
"If you would like to run the partitioning logic locally, you will need to install a combination of system dependencies, as outlined in the [Unstructured documentation here](https://docs.unstructured.io/open-source/installation/full-installation).\n",
|
||||
"If you would like to run the partitioning logic locally, you will need to install\n",
|
||||
"a combination of system dependencies, as outlined in the \n",
|
||||
"[Unstructured documentation here](https://docs.unstructured.io/open-source/installation/full-installation).\n",
|
||||
"\n",
|
||||
"For example, on Macs you can install the required dependencies with:\n",
|
||||
"\n",
|
||||
@@ -86,7 +48,7 @@
|
||||
"brew install libxml2 libxslt\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"You can install the required `pip` dependencies needed for local with:\n",
|
||||
"You can install the required `pip` dependencies with:\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"pip install \"langchain-unstructured[local]\"\n",
|
||||
@@ -98,117 +60,120 @@
|
||||
"id": "a9c1c775",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"### Quickstart\n",
|
||||
"\n",
|
||||
"The `UnstructuredLoader` allows loading from a variety of different file types. To read all about the `unstructured` package please refer to their [documentation](https://docs.unstructured.io/open-source/introduction/overview)/. In this example, we show loading from both a text file and a PDF file."
|
||||
"To simply load a file as a document, you can use the LangChain `DocumentLoader.load` \n",
|
||||
"interface:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"id": "79d3e549",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_unstructured import UnstructuredLoader\n",
|
||||
"\n",
|
||||
"file_paths = [\n",
|
||||
" \"./example_data/layout-parser-paper.pdf\",\n",
|
||||
" \"./example_data/state_of_the_union.txt\",\n",
|
||||
"]\n",
|
||||
"loader = UnstructuredLoader(\"./example_data/state_of_the_union.txt\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"loader = UnstructuredLoader(file_paths)"
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8b68dcab",
|
||||
"id": "b4ab0a79",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load"
|
||||
"### Load list of files"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "8da59ef8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"INFO: NumExpr defaulting to 12 threads.\n",
|
||||
"INFO: pikepdf C++ to Python logger bridge initialized\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 213.36), (16.34, 253.36), (36.34, 253.36), (36.34, 213.36)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-07-25T21:28:58', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'd3ce55f220dfb75891b4394a18bcb973'}, page_content='1 2 0 2')"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "97f7aa1f",
|
||||
"execution_count": 5,
|
||||
"id": "092d9a0b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 213.36), (16.34, 253.36), (36.34, 253.36), (36.34, 213.36)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-07-25T21:28:58', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'd3ce55f220dfb75891b4394a18bcb973'}\n"
|
||||
"whatsapp_chat.txt : 1/22/23, 6:30 PM - User 1: Hi! Im interested in your bag. Im offering $50. Let me know if you are in\n",
|
||||
"state_of_the_union.txt : May God bless you all. May God protect our troops.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].metadata)"
|
||||
"file_paths = [\n",
|
||||
" \"./example_data/whatsapp_chat.txt\",\n",
|
||||
" \"./example_data/state_of_the_union.txt\",\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"loader = UnstructuredLoader(file_paths)\n",
|
||||
"\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"print(docs[0].metadata.get(\"filename\"), \": \", docs[0].page_content[:100])\n",
|
||||
"print(docs[-1].metadata.get(\"filename\"), \": \", docs[-1].page_content[:100])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0d7f991b",
|
||||
"id": "8de9ef16",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Lazy Load"
|
||||
"## PDF Example\n",
|
||||
"\n",
|
||||
"Processing PDF documents works exactly the same way. Unstructured detects the file type and extracts the same types of elements."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "672733fd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Define a Partitioning Strategy\n",
|
||||
"\n",
|
||||
"Unstructured document loader allow users to pass in a `strategy` parameter that lets Unstructured\n",
|
||||
"know how to partition pdf and other OCR'd documents. Currently supported strategies are `\"auto\"`,\n",
|
||||
"`\"hi_res\"`, `\"ocr_only\"`, and `\"fast\"`. Learn more about the different strategies\n",
|
||||
"[here](https://docs.unstructured.io/open-source/core-functionality/partitioning#partition-pdf). \n",
|
||||
"\n",
|
||||
"Not all document types have separate hi res and fast partitioning strategies. For those document types, the `strategy` kwarg is\n",
|
||||
"ignored. In some cases, the high res strategy will fallback to fast if there is a dependency missing\n",
|
||||
"(i.e. a model for document partitioning). You can see how to apply a strategy to an\n",
|
||||
"`UnstructuredLoader` below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "b05604d2",
|
||||
"execution_count": 6,
|
||||
"id": "60685353",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 213.36), (16.34, 253.36), (36.34, 253.36), (36.34, 213.36)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-07-25T21:28:58', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'd3ce55f220dfb75891b4394a18bcb973'}, page_content='1 2 0 2')"
|
||||
"[Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 393.9), (16.34, 560.0), (36.34, 560.0), (36.34, 393.9)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'page_number': 1, 'parent_id': '89565df026a24279aaea20dc08cedbec', 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'e9fa370aef7ee5c05744eb7bb7d9981b'}, page_content='2 v 8 4 3 5 1 . 3 0 1 2 : v i X r a'),\n",
|
||||
" Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((157.62199999999999, 114.23496279999995), (157.62199999999999, 146.5141628), (457.7358962799999, 146.5141628), (457.7358962799999, 114.23496279999995)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'Title', 'element_id': 'bde0b230a1aa488e3ce837d33015181b'}, page_content='LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis'),\n",
|
||||
" Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((134.809, 168.64029940800003), (134.809, 192.2517444), (480.5464199080001, 192.2517444), (480.5464199080001, 168.64029940800003)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'page_number': 1, 'parent_id': 'bde0b230a1aa488e3ce837d33015181b', 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': '54700f902899f0c8c90488fa8d825bce'}, page_content='Zejiang Shen1 ((cid:0)), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain Lee4, Jacob Carlson3, and Weining Li5'),\n",
|
||||
" Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((207.23000000000002, 202.57205439999996), (207.23000000000002, 311.8195408), (408.12676, 311.8195408), (408.12676, 202.57205439999996)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'page_number': 1, 'parent_id': 'bde0b230a1aa488e3ce837d33015181b', 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'b650f5867bad9bb4e30384282c79bcfe'}, page_content='1 Allen Institute for AI shannons@allenai.org 2 Brown University ruochen zhang@brown.edu 3 Harvard University {melissadell,jacob carlson}@fas.harvard.edu 4 University of Washington bcgl@cs.washington.edu 5 University of Waterloo w422li@uwaterloo.ca'),\n",
|
||||
" Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((162.779, 338.45008160000003), (162.779, 566.8455408), (454.0372021523199, 566.8455408), (454.0372021523199, 338.45008160000003)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'links': [{'text': ':// layout - parser . github . io', 'url': 'https://layout-parser.github.io', 'start_index': 1477}], 'page_number': 1, 'parent_id': 'bde0b230a1aa488e3ce837d33015181b', 'filetype': 'application/pdf', 'category': 'NarrativeText', 'element_id': 'cfc957c94fe63c8fd7c7f4bcb56e75a7'}, page_content='Abstract. Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of im- portant innovations by a wide audience. Though there have been on-going efforts to improve reusability and simplify deep learning (DL) model development in disciplines like natural language processing and computer vision, none of them are optimized for challenges in the domain of DIA. This represents a major gap in the existing toolkit, as DIA is central to academic research across a wide range of disciplines in the social sciences and humanities. This paper introduces LayoutParser, an open-source library for streamlining the usage of DL in DIA research and applica- tions. The core LayoutParser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout de- tection, character recognition, and many other document processing tasks. To promote extensibility, LayoutParser also incorporates a community platform for sharing both pre-trained models and full document digiti- zation pipelines. We demonstrate that LayoutParser is helpful for both lightweight and large-scale digitization pipelines in real-word use cases. The library is publicly available at https://layout-parser.github.io.')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pages = []\n",
|
||||
"for doc in loader.lazy_load():\n",
|
||||
" pages.append(doc)\n",
|
||||
"from langchain_unstructured import UnstructuredLoader\n",
|
||||
"\n",
|
||||
"pages[0]"
|
||||
"loader = UnstructuredLoader(\"./example_data/layout-parser-paper.pdf\", strategy=\"fast\")\n",
|
||||
"\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"docs[5:10]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -277,6 +242,23 @@
|
||||
"if you’d like to self-host the Unstructured API or run it locally."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6e5fde16",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Install package\n",
|
||||
"%pip install \"langchain-unstructured\"\n",
|
||||
"%pip install \"unstructured-client\"\n",
|
||||
"\n",
|
||||
"# Set API key\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"UNSTRUCTURED_API_KEY\"] = \"FAKE_API_KEY\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
@@ -514,16 +496,6 @@
|
||||
"print(\"Number of LangChain documents:\", len(docs))\n",
|
||||
"print(\"Length of text in the document:\", len(docs[0].page_content))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ce01aa40",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `UnstructuredLoader` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_unstructured.document_loaders.UnstructuredLoader.html"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -542,7 +514,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.10.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -334,121 +334,6 @@
|
||||
"llm.invoke(\"Tell me a joke\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b29dd776",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Semantic Cache\n",
|
||||
"Use [Upstash Vector](https://upstash.com/docs/vector/overall/whatisvector) to do a semantic similarity search and cache the most similar response in the database. The vectorization is automatically done by the selected embedding model while creating Upstash Vector database. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b37fb3c9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install upstash-semantic-cache"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "8470eedc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.globals import set_llm_cache\n",
|
||||
"from upstash_semantic_cache import SemanticCache"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "16b9fb03",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"UPSTASH_VECTOR_REST_URL = \"<UPSTASH_VECTOR_REST_URL>\"\n",
|
||||
"UPSTASH_VECTOR_REST_TOKEN = \"<UPSTASH_VECTOR_REST_TOKEN>\"\n",
|
||||
"\n",
|
||||
"cache = SemanticCache(\n",
|
||||
" url=UPSTASH_VECTOR_REST_URL, token=UPSTASH_VECTOR_REST_TOKEN, min_proximity=0.7\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "8d37104b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"set_llm_cache(cache)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "926a08b3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 28.4 ms, sys: 3.93 ms, total: 32.3 ms\n",
|
||||
"Wall time: 1.89 s\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\nNew York City is the most crowded city in the USA.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"llm.invoke(\"Which city is the most crowded city in the USA?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "0ce37d57",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 3.22 ms, sys: 940 μs, total: 4.16 ms\n",
|
||||
"Wall time: 97.7 ms\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\nNew York City is the most crowded city in the USA.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"llm.invoke(\"Which city has the highest population in the USA?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "278ad7ae",
|
||||
@@ -2799,7 +2684,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.3"
|
||||
"version": "3.10.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -787,7 +787,7 @@ We need to install `langchain-google-community` with required dependencies:
|
||||
pip install langchain-google-community[gmail]
|
||||
```
|
||||
|
||||
See a [usage example and authorization instructions](/docs/integrations/tools/gmail).
|
||||
See a [usage example and authorization instructions](/docs/integrations/toolkits/gmail).
|
||||
|
||||
```python
|
||||
from langchain_google_community import GmailToolkit
|
||||
|
||||
@@ -370,7 +370,7 @@ We need to install several python packages.
|
||||
pip install azure-ai-formrecognizer azure-cognitiveservices-speech azure-ai-vision-imageanalysis
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/tools/azure_ai_services).
|
||||
See a [usage example](/docs/integrations/toolkits/azure_ai_services).
|
||||
|
||||
```python
|
||||
from langchain_community.agent_toolkits import azure_ai_services
|
||||
@@ -385,7 +385,7 @@ pip install O365
|
||||
```
|
||||
|
||||
|
||||
See a [usage example](/docs/integrations/tools/office365).
|
||||
See a [usage example](/docs/integrations/toolkits/office365).
|
||||
|
||||
```python
|
||||
from langchain_community.agent_toolkits import O365Toolkit
|
||||
@@ -399,7 +399,7 @@ We need to install `azure-identity` python package.
|
||||
pip install azure-identity
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/tools/powerbi).
|
||||
See a [usage example](/docs/integrations/toolkits/powerbi).
|
||||
|
||||
```python
|
||||
from langchain_community.agent_toolkits import PowerBIToolkit
|
||||
|
||||
@@ -15,7 +15,7 @@ pip install ain-py
|
||||
You need to set the `AIN_BLOCKCHAIN_ACCOUNT_PRIVATE_KEY` environmental variable to your AIN Blockchain Account Private Key.
|
||||
## Toolkit
|
||||
|
||||
See a [usage example](/docs/integrations/tools/ainetwork).
|
||||
See a [usage example](/docs/integrations/toolkits/ainetwork).
|
||||
|
||||
```python
|
||||
from langchain_community.agent_toolkits.ainetwork.toolkit import AINetworkToolkit
|
||||
|
||||
@@ -27,7 +27,7 @@ You can use the `ApifyWrapper` to run Actors on the Apify platform.
|
||||
from langchain_community.utilities import ApifyWrapper
|
||||
```
|
||||
|
||||
For more information on this wrapper, see [the API reference](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.apify.ApifyWrapper.html).
|
||||
For a more detailed walkthrough of this wrapper, see [this notebook](/docs/integrations/tools/apify).
|
||||
|
||||
|
||||
## Document loader
|
||||
|
||||
@@ -80,6 +80,6 @@ from langchain_community.agent_toolkits.cassandra_database.toolkit import (
|
||||
)
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/tools/cassandra_database).
|
||||
Learn more in the [example notebook](/docs/integrations/toolkits/cassandra_database).
|
||||
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@ The Kinetica LLM wrapper uses the [Kinetica SqlAssist
|
||||
LLM](https://docs.kinetica.com/7.2/sql-gpt/concepts/) to transform natural language into
|
||||
SQL to simplify the process of data retrieval.
|
||||
|
||||
See [Kinetica Language To SQL Chat Model](/docs/integrations/chat/kinetica) for usage.
|
||||
See [Kinetica SqlAssist LLM Demo](/docs/integrations/chat/kinetica) for usage.
|
||||
|
||||
```python
|
||||
from langchain_community.chat_models.kinetica import ChatKinetica
|
||||
|
||||
@@ -30,7 +30,7 @@ from langchain_robocorp.toolkits import ActionServerRequestTool
|
||||
|
||||
## Toolkit
|
||||
|
||||
See a [usage example](/docs/integrations/tools/robocorp).
|
||||
See a [usage example](/docs/integrations/toolkits/robocorp).
|
||||
|
||||
```python
|
||||
from langchain_robocorp import ActionServerToolkit
|
||||
|
||||
@@ -17,7 +17,7 @@ from langchain_community.document_loaders import SlackDirectoryLoader
|
||||
|
||||
## Toolkit
|
||||
|
||||
See a [usage example](/docs/integrations/tools/slack).
|
||||
See a [usage example](/docs/integrations/toolkits/slack).
|
||||
|
||||
```python
|
||||
from langchain_community.agent_toolkits import SlackToolkit
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# AINetwork Toolkit\n",
|
||||
"# AINetwork\n",
|
||||
"\n",
|
||||
">[AI Network](https://www.ainetwork.ai/build-on-ain) is a layer 1 blockchain designed to accommodate large-scale AI models, utilizing a decentralized GPU network powered by the [$AIN token](https://www.ainetwork.ai/token), enriching AI-driven `NFTs` (`AINFTs`).\n",
|
||||
">\n",
|
||||
129
docs/docs/integrations/toolkits/airbyte_structured_qa.ipynb
Normal file
129
docs/docs/integrations/toolkits/airbyte_structured_qa.ipynb
Normal file
@@ -0,0 +1,129 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airbyte Question Answering\n",
|
||||
"This notebook shows how to do question answering over structured data, in this case using the `AirbyteStripeLoader`.\n",
|
||||
"\n",
|
||||
"Vectorstores often have a hard time answering questions that requires computing, grouping and filtering structured data so the high level idea is to use a `pandas` dataframe to help with these types of questions. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-community"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"1. Load data from Stripe using Airbyte. user the `record_handler` paramater to return a JSON from the data loader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"import pandas as pd\n",
|
||||
"from langchain.agents import AgentType\n",
|
||||
"from langchain_community.document_loaders.airbyte import AirbyteStripeLoader\n",
|
||||
"from langchain_experimental.agents import create_pandas_dataframe_agent\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"stream_name = \"customers\"\n",
|
||||
"config = {\n",
|
||||
" \"client_secret\": os.getenv(\"STRIPE_CLIENT_SECRET\"),\n",
|
||||
" \"account_id\": os.getenv(\"STRIPE_ACCOUNT_D\"),\n",
|
||||
" \"start_date\": \"2023-01-20T00:00:00Z\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def handle_record(record: dict, _id: str):\n",
|
||||
" return record.data\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"loader = AirbyteStripeLoader(\n",
|
||||
" config=config,\n",
|
||||
" record_handler=handle_record,\n",
|
||||
" stream_name=stream_name,\n",
|
||||
")\n",
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"2. Pass the data to `pandas` dataframe."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df = pd.DataFrame(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"3. Pass the dataframe `df` to the `create_pandas_dataframe_agent` and invoke\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_pandas_dataframe_agent(\n",
|
||||
" ChatOpenAI(temperature=0, model=\"gpt-4\"),\n",
|
||||
" df,\n",
|
||||
" verbose=True,\n",
|
||||
" agent_type=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"4. Run the agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"output = agent.run(\"How many rows are there?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Amadeus Toolkit\n",
|
||||
"# Amadeus\n",
|
||||
"\n",
|
||||
"This notebook walks you through connecting LangChain to the `Amadeus` travel APIs.\n",
|
||||
"\n",
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure AI Services Toolkit\n",
|
||||
"# Azure AI Services\n",
|
||||
"\n",
|
||||
"This toolkit is used to interact with the `Azure AI Services API` to achieve some multimodal capabilities.\n",
|
||||
"\n",
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure Cognitive Services Toolkit\n",
|
||||
"# Azure Cognitive Services\n",
|
||||
"\n",
|
||||
"This toolkit is used to interact with the `Azure Cognitive Services API` to achieve some multimodal capabilities.\n",
|
||||
"\n",
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Cassandra Database Toolkit\n",
|
||||
"# Cassandra Database\n",
|
||||
"\n",
|
||||
">`Apache Cassandra®` is a widely used database for storing transactional application data. The introduction of functions and >tooling in Large Language Models has opened up some exciting use cases for existing data in Generative AI applications. \n",
|
||||
"\n",
|
||||
@@ -148,6 +148,11 @@
|
||||
" CassandraDatabaseToolkit,\n",
|
||||
")\n",
|
||||
"from langchain_community.tools.cassandra_database.prompt import QUERY_PATH_PROMPT\n",
|
||||
"from langchain_community.tools.cassandra_database.tool import (\n",
|
||||
" GetSchemaCassandraDatabaseTool,\n",
|
||||
" GetTableDataCassandraDatabaseTool,\n",
|
||||
" QueryCassandraDatabaseTool,\n",
|
||||
")\n",
|
||||
"from langchain_community.utilities.cassandra_database import CassandraDatabase\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
]
|
||||
@@ -258,7 +263,12 @@
|
||||
"source": [
|
||||
"# Create a CassandraDatabase instance\n",
|
||||
"# Uses the cassio session to connect to the database\n",
|
||||
"db = CassandraDatabase()"
|
||||
"db = CassandraDatabase()\n",
|
||||
"\n",
|
||||
"# Create the Cassandra Database tools\n",
|
||||
"query_tool = QueryCassandraDatabaseTool(db=db)\n",
|
||||
"schema_tool = GetSchemaCassandraDatabaseTool(db=db)\n",
|
||||
"select_data_tool = GetTableDataCassandraDatabaseTool(db=db)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ClickUp Toolkit\n",
|
||||
"# ClickUp\n",
|
||||
"\n",
|
||||
">[ClickUp](https://clickup.com/) is an all-in-one productivity platform that provides small and large teams across industries with flexible and customizable work management solutions, tools, and functions. \n",
|
||||
"\n",
|
||||
@@ -5,17 +5,44 @@
|
||||
"id": "19062701",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Cogniswitch Toolkit\n",
|
||||
"## Cogniswitch Tools\n",
|
||||
"\n",
|
||||
"CogniSwitch is used to build production ready applications that can consume, organize and retrieve knowledge flawlessly. Using the framework of your choice, in this case Langchain, CogniSwitch helps alleviate the stress of decision making when it comes to, choosing the right storage and retrieval formats. It also eradicates reliability issues and hallucinations when it comes to responses that are generated. \n",
|
||||
"**Use CogniSwitch to build production ready applications that can consume, organize and retrieve knowledge flawlessly. Using the framework of your choice, in this case Langchain CogniSwitch helps alleviate the stress of decision making when it comes to, choosing the right storage and retrieval formats. It also eradicates reliability issues and hallucinations when it comes to responses that are generated. Get started by interacting with your knowledge in just two simple steps.**\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"visit [https://www.cogniswitch.ai/developer to register](https://www.cogniswitch.ai/developer?utm_source=langchain&utm_medium=langchainbuild&utm_id=dev).\n",
|
||||
"\n",
|
||||
"Visit [this page](https://www.cogniswitch.ai/developer?utm_source=langchain&utm_medium=langchainbuild&utm_id=dev) to register a Cogniswitch account.\n",
|
||||
"**Registration:** \n",
|
||||
"\n",
|
||||
"- Signup with your email and verify your registration \n",
|
||||
"\n",
|
||||
"- You will get a mail with a platform token and oauth token for using the services.\n"
|
||||
"- You will get a mail with a platform token and oauth token for using the services.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"**step 1: Instantiate the toolkit and get the tools:**\n",
|
||||
"\n",
|
||||
"- Instantiate the cogniswitch toolkit with the cogniswitch token, openAI API key and OAuth token and get the tools. \n",
|
||||
"\n",
|
||||
"**step 2: Instantiate the agent with the tools and llm:**\n",
|
||||
"- Instantiate the agent with the list of cogniswitch tools and the llm, into the agent executor.\n",
|
||||
"\n",
|
||||
"**step 3: CogniSwitch Store Tool:** \n",
|
||||
"\n",
|
||||
"***CogniSwitch knowledge source file tool***\n",
|
||||
"- Use the agent to upload a file by giving the file path.(formats that are currently supported are .pdf, .docx, .doc, .txt, .html) \n",
|
||||
"- The content from the file will be processed by the cogniswitch and stored in your knowledge store. \n",
|
||||
"\n",
|
||||
"***CogniSwitch knowledge source url tool***\n",
|
||||
"- Use the agent to upload a URL. \n",
|
||||
"- The content from the url will be processed by the cogniswitch and stored in your knowledge store. \n",
|
||||
"\n",
|
||||
"**step 4: CogniSwitch Status Tool:**\n",
|
||||
"- Use the agent to know the status of the document uploaded with a document name.\n",
|
||||
"- You can also check the status of document processing in cogniswitch console. \n",
|
||||
"\n",
|
||||
"**step 5: CogniSwitch Answer Tool:**\n",
|
||||
"- Use the agent to ask your question.\n",
|
||||
"- You will get the answer from your knowledge as the response. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -33,7 +60,7 @@
|
||||
"id": "1435b193",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Import necessary libraries"
|
||||
"### Import necessary libraries"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -59,7 +86,7 @@
|
||||
"id": "6e6acf0e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Cogniswitch platform token, OAuth token and OpenAI API key"
|
||||
"### Cogniswitch platform token, OAuth token and OpenAI API key"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -81,7 +108,7 @@
|
||||
"id": "320e02fc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiate the cogniswitch toolkit with the credentials"
|
||||
"### Instantiate the cogniswitch toolkit with the credentials"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -119,7 +146,7 @@
|
||||
"id": "4aae43a3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiate the LLM"
|
||||
"### Instantiate the llm"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -142,9 +169,7 @@
|
||||
"id": "04179282",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use the LLM with the Toolkit\n",
|
||||
"\n",
|
||||
"### Create an agent with the LLM and Toolkit"
|
||||
"### Create a agent executor"
|
||||
]
|
||||
},
|
||||
{
|
||||
301
docs/docs/integrations/toolkits/csv.ipynb
Normal file
301
docs/docs/integrations/toolkits/csv.ipynb
Normal file
@@ -0,0 +1,301 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7094e328",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CSV\n",
|
||||
"\n",
|
||||
"This notebook shows how to use agents to interact with data in `CSV` format. It is mostly optimized for question answering.\n",
|
||||
"\n",
|
||||
"**NOTE: this agent calls the Pandas DataFrame agent under the hood, which in turn calls the Python agent, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously.**\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "caae0bec",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents.agent_types import AgentType\n",
|
||||
"from langchain_experimental.agents.agent_toolkits import create_csv_agent\n",
|
||||
"from langchain_openai import ChatOpenAI, OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bd806175",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using `ZERO_SHOT_REACT_DESCRIPTION`\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the `ZERO_SHOT_REACT_DESCRIPTION` agent type."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "a1717204",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_csv_agent(\n",
|
||||
" OpenAI(temperature=0),\n",
|
||||
" \"titanic.csv\",\n",
|
||||
" verbose=True,\n",
|
||||
" agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c31bb8a6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using OpenAI Functions\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the OPENAI_FUNCTIONS agent type. Note that this is an alternative to the above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "16c4dc59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_csv_agent(\n",
|
||||
" ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"),\n",
|
||||
" \"titanic.csv\",\n",
|
||||
" verbose=True,\n",
|
||||
" agent_type=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "46b9489d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Error in on_chain_start callback: 'name'\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `python_repl_ast` with `df.shape[0]`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m891\u001b[0m\u001b[32;1m\u001b[1;3mThere are 891 rows in the dataframe.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 891 rows in the dataframe.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"how many rows are there?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a96309be",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Error in on_chain_start callback: 'name'\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `python_repl_ast` with `df[df['SibSp'] > 3]['PassengerId'].count()`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m30\u001b[0m\u001b[32;1m\u001b[1;3mThere are 30 people in the dataframe who have more than 3 siblings.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 30 people in the dataframe who have more than 3 siblings.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"how many people have more than 3 siblings\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "964a09f7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Error in on_chain_start callback: 'name'\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `python_repl_ast` with `import pandas as pd\n",
|
||||
"import math\n",
|
||||
"\n",
|
||||
"# Create a dataframe\n",
|
||||
"data = {'Age': [22, 38, 26, 35, 35]}\n",
|
||||
"df = pd.DataFrame(data)\n",
|
||||
"\n",
|
||||
"# Calculate the average age\n",
|
||||
"average_age = df['Age'].mean()\n",
|
||||
"\n",
|
||||
"# Calculate the square root of the average age\n",
|
||||
"square_root = math.sqrt(average_age)\n",
|
||||
"\n",
|
||||
"square_root`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m5.585696017507576\u001b[0m\u001b[32;1m\u001b[1;3mThe square root of the average age is approximately 5.59.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The square root of the average age is approximately 5.59.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"whats the square root of the average age?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "09539c18",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Multi CSV Example\n",
|
||||
"\n",
|
||||
"This next part shows how the agent can interact with multiple csv files passed in as a list."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "15f11fbd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Error in on_chain_start callback: 'name'\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `python_repl_ast` with `df1['Age'].nunique() - df2['Age'].nunique()`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m-1\u001b[0m\u001b[32;1m\u001b[1;3mThere is 1 row in the age column that is different between the two dataframes.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There is 1 row in the age column that is different between the two dataframes.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent = create_csv_agent(\n",
|
||||
" ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"),\n",
|
||||
" [\"titanic.csv\", \"titanic_age_fillna.csv\"],\n",
|
||||
" verbose=True,\n",
|
||||
" agent_type=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
")\n",
|
||||
"agent.run(\"how many rows in the age column are different between the two dfs?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f2909808",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -4,7 +4,16 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Github Toolkit\n",
|
||||
"---\n",
|
||||
"sidebar_label: Github\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GithubToolkit\n",
|
||||
"\n",
|
||||
"The `Github` toolkit contains tools that enable an LLM agent to interact with a github repository. \n",
|
||||
"The tool is a wrapper for the [PyGitHub](https://github.com/PyGithub/PyGithub) library. \n",
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Gitlab Toolkit\n",
|
||||
"# Gitlab\n",
|
||||
"\n",
|
||||
"The `Gitlab` toolkit contains tools that enable an LLM agent to interact with a gitlab repository. \n",
|
||||
"The tool is a wrapper for the [python-gitlab](https://github.com/python-gitlab/python-gitlab) library. \n",
|
||||
@@ -4,7 +4,16 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Gmail Toolkit\n",
|
||||
"---\n",
|
||||
"sidebar_label: GMail\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GmailToolkit\n",
|
||||
"\n",
|
||||
"This will help you getting started with the GMail [toolkit](/docs/concepts/#toolkits). This toolkit interacts with the GMail API to read messages, draft and send messages, and more. For detailed documentation of all GmailToolkit features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/gmail/langchain_google_community.gmail.toolkit.GmailToolkit.html).\n",
|
||||
"\n",
|
||||
@@ -243,7 +252,7 @@
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `GmailToolkit` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.gmail.toolkit.GmailToolkit.html)."
|
||||
"For detailed documentation of all `GmailToolkit` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.slack.toolkit.SlackToolkit.html)."
|
||||
]
|
||||
}
|
||||
],
|
||||
21
docs/docs/integrations/toolkits/index.mdx
Normal file
21
docs/docs/integrations/toolkits/index.mdx
Normal file
@@ -0,0 +1,21 @@
|
||||
---
|
||||
sidebar_position: 0
|
||||
sidebar_class_name: hidden
|
||||
---
|
||||
|
||||
# Toolkits
|
||||
|
||||
**Toolkits** are collections of tools that are designed to be used together for specific tasks. They include conveniences for loading tools
|
||||
that share common authentication, services, or other objects. They can be implemented by subclassing the
|
||||
[BaseToolkit](https://api.python.langchain.com/en/latest/tools/langchain_core.tools.BaseToolkit.html#langchain_core.tools.BaseToolkit) class.
|
||||
|
||||
This table lists common toolkits.
|
||||
|
||||
|
||||
| Toolkit | Package |
|
||||
|------|---------------|
|
||||
| [GitHubToolkit](/docs/integrations/toolkits/github) | [langchain_community.agent_toolkits.github](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.github.toolkit.GitHubToolkit.html) |
|
||||
| [GmailToolkit](/docs/integrations/toolkits/gmail) | [langchain_google_community.gmail.toolkit](https://api.python.langchain.com/en/latest/gmail/langchain_google_community.gmail.toolkit.GmailToolkit.html) |
|
||||
| [RequestsToolkit](/docs/integrations/toolkits/requests) | [langchain_community.agent_toolkits.openapi](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.openapi.toolkit.RequestsToolkit.html) |
|
||||
| [SlackToolkit](/docs/integrations/toolkits/slack) | [langchain_community.agent_toolkits.slack](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.slack.toolkit.SlackToolkit.html) |
|
||||
| [SQLDatabaseToolkit](/docs/integrations/toolkits/sql_database) | [langchain_community.agent_toolkits.sql](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.html) |
|
||||
@@ -5,7 +5,7 @@
|
||||
"id": "245a954a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Jira Toolkit\n",
|
||||
"# Jira\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use the `Jira` toolkit.\n",
|
||||
"\n",
|
||||
@@ -5,7 +5,7 @@
|
||||
"id": "85fb2c03-ab88-4c8c-97e3-a7f2954555ab",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# JSON Toolkit\n",
|
||||
"# JSON\n",
|
||||
"\n",
|
||||
"This notebook showcases an agent interacting with large `JSON/dict` objects. \n",
|
||||
"This is useful when you want to answer questions about a JSON blob that's too large to fit in the context window of an LLM. The agent is able to iteratively explore the blob to find what it needs to answer the user's question.\n",
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# MultiOn Toolkit\n",
|
||||
"# MultiOn\n",
|
||||
" \n",
|
||||
"[MultiON](https://www.multion.ai/blog/multion-building-a-brighter-future-for-humanity-with-ai-agents) has built an AI Agent that can interact with a broad array of web services and applications. \n",
|
||||
"\n",
|
||||
@@ -5,7 +5,7 @@
|
||||
"id": "e6fd05db-21c2-4227-9900-0840bc62cb31",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# NASA Toolkit\n",
|
||||
"# NASA\n",
|
||||
"\n",
|
||||
"This notebook shows how to use agents to interact with the NASA toolkit. The toolkit provides access to the NASA Image and Video Library API, with potential to expand and include other accessible NASA APIs in future iterations.\n",
|
||||
"\n",
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Office365 Toolkit\n",
|
||||
"# Office365\n",
|
||||
"\n",
|
||||
">[Microsoft 365](https://www.office.com/) is a product family of productivity software, collaboration and cloud-based services owned by `Microsoft`.\n",
|
||||
">\n",
|
||||
@@ -5,7 +5,7 @@
|
||||
"id": "85fb2c03-ab88-4c8c-97e3-a7f2954555ab",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAPI Toolkit\n",
|
||||
"# OpenAPI\n",
|
||||
"\n",
|
||||
"We can construct agents to consume arbitrary APIs, here APIs conformant to the `OpenAPI`/`Swagger` specification."
|
||||
]
|
||||
@@ -5,7 +5,7 @@
|
||||
"id": "c7ad998d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Natural Language API Toolkits\n",
|
||||
"# Natural Language APIs\n",
|
||||
"\n",
|
||||
"`Natural Language API` Toolkits (`NLAToolkits`) permit LangChain Agents to efficiently plan and combine calls across endpoints. \n",
|
||||
"\n",
|
||||
351
docs/docs/integrations/toolkits/playwright.ipynb
Normal file
351
docs/docs/integrations/toolkits/playwright.ipynb
Normal file
File diff suppressed because one or more lines are too long
@@ -5,7 +5,7 @@
|
||||
"id": "9363398d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PowerBI Toolkit\n",
|
||||
"# PowerBI Dataset\n",
|
||||
"\n",
|
||||
"This notebook showcases an agent interacting with a `Power BI Dataset`. The agent is answering more general questions about a dataset, as well as recover from errors.\n",
|
||||
"\n",
|
||||
382
docs/docs/integrations/toolkits/python.ipynb
Normal file
382
docs/docs/integrations/toolkits/python.ipynb
Normal file
@@ -0,0 +1,382 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "be75cb7e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"keywords: [PythonREPLTool]\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "82a4c2cc-20ea-4b20-a565-63e905dee8ff",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Python\n",
|
||||
"\n",
|
||||
"This notebook showcases an agent designed to write and execute `Python` code to answer a question."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "f98e9c90-5c37-4fb9-af3e-d09693af8543",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import hub\n",
|
||||
"from langchain.agents import AgentExecutor\n",
|
||||
"from langchain_experimental.tools import PythonREPLTool"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ba9adf51",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create the tool(s)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "003bce04",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = [PythonREPLTool()]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4aceaeaf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using OpenAI Functions Agent\n",
|
||||
"\n",
|
||||
"This is probably the most reliable type of agent, but is only compatible with function calling"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "3a054d1d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import create_openai_functions_agent\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "3454514b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"instructions = \"\"\"You are an agent designed to write and execute python code to answer questions.\n",
|
||||
"You have access to a python REPL, which you can use to execute python code.\n",
|
||||
"If you get an error, debug your code and try again.\n",
|
||||
"Only use the output of your code to answer the question. \n",
|
||||
"You might know the answer without running any code, but you should still run the code to get the answer.\n",
|
||||
"If it does not seem like you can write code to answer the question, just return \"I don't know\" as the answer.\n",
|
||||
"\"\"\"\n",
|
||||
"base_prompt = hub.pull(\"langchain-ai/openai-functions-template\")\n",
|
||||
"prompt = base_prompt.partial(instructions=instructions)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "2a573e95",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_openai_functions_agent(ChatOpenAI(temperature=0), tools, prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "cae41550",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ca30d64c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using ReAct Agent\n",
|
||||
"\n",
|
||||
"This is a less reliable type, but is compatible with most models"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "bcaa0b18",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import create_react_agent\n",
|
||||
"from langchain_anthropic import ChatAnthropic"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"id": "d2470880",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"instructions = \"\"\"You are an agent designed to write and execute python code to answer questions.\n",
|
||||
"You have access to a python REPL, which you can use to execute python code.\n",
|
||||
"If you get an error, debug your code and try again.\n",
|
||||
"Only use the output of your code to answer the question. \n",
|
||||
"You might know the answer without running any code, but you should still run the code to get the answer.\n",
|
||||
"If it does not seem like you can write code to answer the question, just return \"I don't know\" as the answer.\n",
|
||||
"\"\"\"\n",
|
||||
"base_prompt = hub.pull(\"langchain-ai/react-agent-template\")\n",
|
||||
"prompt = base_prompt.partial(instructions=instructions)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"id": "cc422f53-c51c-4694-a834-72ecd1e68363",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_react_agent(ChatAnthropic(temperature=0), tools, prompt)\n",
|
||||
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c16161de",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fibonacci Example\n",
|
||||
"This example was created by [John Wiseman](https://twitter.com/lemonodor/status/1628270074074398720?s=20)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"id": "25cd4f92-ea9b-4fe6-9838-a4f85f81eebe",
|
||||
"metadata": {
|
||||
"scrolled": false,
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m Sure, I can write some Python code to get the 10th Fibonacci number.\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Thought: Do I need to use a tool? Yes\n",
|
||||
"Action: Python_REPL \n",
|
||||
"Action Input: \n",
|
||||
"def fib(n):\n",
|
||||
" a, b = 0, 1\n",
|
||||
" for i in range(n):\n",
|
||||
" a, b = b, a + b\n",
|
||||
" return a\n",
|
||||
"\n",
|
||||
"print(fib(10))\n",
|
||||
"```\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m55\n",
|
||||
"\u001b[0m\u001b[32;1m\u001b[1;3m Let me break this down step-by-step:\n",
|
||||
"\n",
|
||||
"1. I defined a fibonacci function called `fib` that takes in a number `n`. \n",
|
||||
"2. Inside the function, I initialized two variables `a` and `b` to 0 and 1, which are the first two Fibonacci numbers.\n",
|
||||
"3. Then I used a for loop to iterate up to `n`, updating `a` and `b` each iteration to the next Fibonacci numbers.\n",
|
||||
"4. Finally, I return `a`, which after `n` iterations, contains the `n`th Fibonacci number.\n",
|
||||
"\n",
|
||||
"5. I called `fib(10)` to get the 10th Fibonacci number and printed the result.\n",
|
||||
"\n",
|
||||
"The key parts are defining the fibonacci calculation in the function, and then calling it with the desired input index to print the output.\n",
|
||||
"\n",
|
||||
"The observation shows the 10th Fibonacci number is 55, so that is the final answer.\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Thought: Do I need to use a tool? No\n",
|
||||
"Final Answer: 55\n",
|
||||
"```\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'input': 'What is the 10th fibonacci number?', 'output': '55\\n```'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 31,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent_executor.invoke({\"input\": \"What is the 10th fibonacci number?\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7caa30de",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Training neural net\n",
|
||||
"This example was created by [Samee Ur Rehman](https://twitter.com/sameeurehman/status/1630130518133207046?s=20)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "4b9f60e7-eb6a-4f14-8604-498d863d4482",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mCould not parse tool input: {'name': 'python', 'arguments': 'import torch\\nimport torch.nn as nn\\nimport torch.optim as optim\\n\\n# Define the neural network\\nclass SingleNeuron(nn.Module):\\n def __init__(self):\\n super(SingleNeuron, self).__init__()\\n self.linear = nn.Linear(1, 1)\\n \\n def forward(self, x):\\n return self.linear(x)\\n\\n# Create the synthetic data\\nx_train = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)\\ny_train = torch.tensor([[2.0], [4.0], [6.0], [8.0]], dtype=torch.float32)\\n\\n# Create the neural network\\nmodel = SingleNeuron()\\n\\n# Define the loss function and optimizer\\ncriterion = nn.MSELoss()\\noptimizer = optim.SGD(model.parameters(), lr=0.01)\\n\\n# Train the neural network\\nfor epoch in range(1, 1001):\\n # Forward pass\\n y_pred = model(x_train)\\n \\n # Compute loss\\n loss = criterion(y_pred, y_train)\\n \\n # Backward pass and optimization\\n optimizer.zero_grad()\\n loss.backward()\\n optimizer.step()\\n \\n # Print the loss every 100 epochs\\n if epoch % 100 == 0:\\n print(f\"Epoch {epoch}: Loss = {loss.item()}\")\\n\\n# Make a prediction for x = 5\\nx_test = torch.tensor([[5.0]], dtype=torch.float32)\\ny_pred = model(x_test)\\ny_pred.item()'} because the `arguments` is not valid JSON.\u001b[0mInvalid or incomplete response\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `Python_REPL` with `import torch\n",
|
||||
"import torch.nn as nn\n",
|
||||
"import torch.optim as optim\n",
|
||||
"\n",
|
||||
"# Define the neural network\n",
|
||||
"class SingleNeuron(nn.Module):\n",
|
||||
" def __init__(self):\n",
|
||||
" super(SingleNeuron, self).__init__()\n",
|
||||
" self.linear = nn.Linear(1, 1)\n",
|
||||
" \n",
|
||||
" def forward(self, x):\n",
|
||||
" return self.linear(x)\n",
|
||||
"\n",
|
||||
"# Create the synthetic data\n",
|
||||
"x_train = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)\n",
|
||||
"y_train = torch.tensor([[2.0], [4.0], [6.0], [8.0]], dtype=torch.float32)\n",
|
||||
"\n",
|
||||
"# Create the neural network\n",
|
||||
"model = SingleNeuron()\n",
|
||||
"\n",
|
||||
"# Define the loss function and optimizer\n",
|
||||
"criterion = nn.MSELoss()\n",
|
||||
"optimizer = optim.SGD(model.parameters(), lr=0.01)\n",
|
||||
"\n",
|
||||
"# Train the neural network\n",
|
||||
"for epoch in range(1, 1001):\n",
|
||||
" # Forward pass\n",
|
||||
" y_pred = model(x_train)\n",
|
||||
" \n",
|
||||
" # Compute loss\n",
|
||||
" loss = criterion(y_pred, y_train)\n",
|
||||
" \n",
|
||||
" # Backward pass and optimization\n",
|
||||
" optimizer.zero_grad()\n",
|
||||
" loss.backward()\n",
|
||||
" optimizer.step()\n",
|
||||
" \n",
|
||||
" # Print the loss every 100 epochs\n",
|
||||
" if epoch % 100 == 0:\n",
|
||||
" print(f\"Epoch {epoch}: Loss = {loss.item()}\")\n",
|
||||
"\n",
|
||||
"# Make a prediction for x = 5\n",
|
||||
"x_test = torch.tensor([[5.0]], dtype=torch.float32)\n",
|
||||
"y_pred = model(x_test)\n",
|
||||
"y_pred.item()`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3mEpoch 100: Loss = 0.03825576975941658\n",
|
||||
"Epoch 200: Loss = 0.02100197970867157\n",
|
||||
"Epoch 300: Loss = 0.01152981910854578\n",
|
||||
"Epoch 400: Loss = 0.006329738534986973\n",
|
||||
"Epoch 500: Loss = 0.0034749575424939394\n",
|
||||
"Epoch 600: Loss = 0.0019077073084190488\n",
|
||||
"Epoch 700: Loss = 0.001047312980517745\n",
|
||||
"Epoch 800: Loss = 0.0005749554838985205\n",
|
||||
"Epoch 900: Loss = 0.0003156439634039998\n",
|
||||
"Epoch 1000: Loss = 0.00017328384274151176\n",
|
||||
"\u001b[0m\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `Python_REPL` with `x_test.item()`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m\u001b[0m\u001b[32;1m\u001b[1;3mThe prediction for x = 5 is 10.000173568725586.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The prediction for x = 5 is 10.000173568725586.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent_executor.invoke(\n",
|
||||
" {\n",
|
||||
" \"input\": \"\"\"Understand, write a single neuron neural network in PyTorch.\n",
|
||||
"Take synthetic data for y=2x. Train for 1000 epochs and print every 100 epochs.\n",
|
||||
"Return prediction for x = 5\"\"\"\n",
|
||||
" }\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eb654671",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
361
docs/docs/integrations/toolkits/requests.ipynb
Normal file
361
docs/docs/integrations/toolkits/requests.ipynb
Normal file
@@ -0,0 +1,361 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "050c5580-2c85-4763-8783-59dbd20395a5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_label: Requests\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cfe4185a-34dc-4cdc-b831-001954f2d6e8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Requests Toolkit\n",
|
||||
"\n",
|
||||
"We can use the Requests [toolkit](/docs/concepts/#toolkits) to construct agents that generate HTTP requests.\n",
|
||||
"\n",
|
||||
"For detailed documentation of all API toolkit features and configurations head to the API reference for [RequestsToolkit](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.openapi.toolkit.RequestsToolkit.html).\n",
|
||||
"\n",
|
||||
"## ⚠️ Security note ⚠️\n",
|
||||
"There are inherent risks in giving models discretion to execute real-world actions. Take precautions to mitigate these risks:\n",
|
||||
"\n",
|
||||
"- Make sure that permissions associated with the tools are narrowly-scoped (e.g., for database operations or API requests);\n",
|
||||
"- When desired, make use of human-in-the-loop workflows."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d968e982-f370-4614-8469-c1bc71ee3e32",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"This toolkit lives in the `langchain-community` package:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f74f05fb-3f24-4c0b-a17f-cf4edeedbb9a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-community"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "36a178eb-1f2c-411e-bf25-0240ead4c62a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that if you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8e68d0cd-6233-481c-b048-e8d95cba4c35",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a7e2f64a-a72e-4fef-be52-eaf7c5072d24",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"First we will demonstrate a minimal example.\n",
|
||||
"\n",
|
||||
"**NOTE**: There are inherent risks in giving models discretion to execute real-world actions. We must \"opt-in\" to these risks by setting `allow_dangerous_request=True` to use these tools.\n",
|
||||
"**This can be dangerous for calling unwanted requests**. Please make sure your custom OpenAPI spec (yaml) is safe and that permissions associated with the tools are narrowly-scoped."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "018bd070-9fc8-459b-8d28-b4a3e283e640",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ALLOW_DANGEROUS_REQUEST = True"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a024f7b3-5437-4878-bd16-c4783bff394c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can use the [JSONPlaceholder](https://jsonplaceholder.typicode.com) API as a testing ground.\n",
|
||||
"\n",
|
||||
"Let's create (a subset of) its API spec:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "2dcbcf92-2ad5-49c3-94ac-91047ccc8c5b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Any, Dict, Union\n",
|
||||
"\n",
|
||||
"import requests\n",
|
||||
"import yaml\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def _get_schema(response_json: Union[dict, list]) -> dict:\n",
|
||||
" if isinstance(response_json, list):\n",
|
||||
" response_json = response_json[0] if response_json else {}\n",
|
||||
" return {key: type(value).__name__ for key, value in response_json.items()}\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def _get_api_spec() -> str:\n",
|
||||
" base_url = \"https://jsonplaceholder.typicode.com\"\n",
|
||||
" endpoints = [\n",
|
||||
" \"/posts\",\n",
|
||||
" \"/comments\",\n",
|
||||
" ]\n",
|
||||
" common_query_parameters = [\n",
|
||||
" {\n",
|
||||
" \"name\": \"_limit\",\n",
|
||||
" \"in\": \"query\",\n",
|
||||
" \"required\": False,\n",
|
||||
" \"schema\": {\"type\": \"integer\", \"example\": 2},\n",
|
||||
" \"description\": \"Limit the number of results\",\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" openapi_spec: Dict[str, Any] = {\n",
|
||||
" \"openapi\": \"3.0.0\",\n",
|
||||
" \"info\": {\"title\": \"JSONPlaceholder API\", \"version\": \"1.0.0\"},\n",
|
||||
" \"servers\": [{\"url\": base_url}],\n",
|
||||
" \"paths\": {},\n",
|
||||
" }\n",
|
||||
" # Iterate over the endpoints to construct the paths\n",
|
||||
" for endpoint in endpoints:\n",
|
||||
" response = requests.get(base_url + endpoint)\n",
|
||||
" if response.status_code == 200:\n",
|
||||
" schema = _get_schema(response.json())\n",
|
||||
" openapi_spec[\"paths\"][endpoint] = {\n",
|
||||
" \"get\": {\n",
|
||||
" \"summary\": f\"Get {endpoint[1:]}\",\n",
|
||||
" \"parameters\": common_query_parameters,\n",
|
||||
" \"responses\": {\n",
|
||||
" \"200\": {\n",
|
||||
" \"description\": \"Successful response\",\n",
|
||||
" \"content\": {\n",
|
||||
" \"application/json\": {\n",
|
||||
" \"schema\": {\"type\": \"object\", \"properties\": schema}\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" return yaml.dump(openapi_spec, sort_keys=False)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"api_spec = _get_api_spec()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "db3d6148-ae65-4a1d-91a6-59ee3e4e6efa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next we can instantiate the toolkit. We require no authorization or other headers for this API:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "63a630b3-45bb-4525-865b-083f322b944b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.agent_toolkits.openapi.toolkit import RequestsToolkit\n",
|
||||
"from langchain_community.utilities.requests import TextRequestsWrapper\n",
|
||||
"\n",
|
||||
"toolkit = RequestsToolkit(\n",
|
||||
" requests_wrapper=TextRequestsWrapper(headers={}),\n",
|
||||
" allow_dangerous_requests=ALLOW_DANGEROUS_REQUEST,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f4224a64-843a-479d-8a7b-84719e4b9d0c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"\n",
|
||||
"View available tools:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "70ea0f4e-9f10-4906-894b-08df832fd515",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[RequestsGetTool(requests_wrapper=TextRequestsWrapper(headers={}, aiosession=None, auth=None, response_content_type='text', verify=True), allow_dangerous_requests=True),\n",
|
||||
" RequestsPostTool(requests_wrapper=TextRequestsWrapper(headers={}, aiosession=None, auth=None, response_content_type='text', verify=True), allow_dangerous_requests=True),\n",
|
||||
" RequestsPatchTool(requests_wrapper=TextRequestsWrapper(headers={}, aiosession=None, auth=None, response_content_type='text', verify=True), allow_dangerous_requests=True),\n",
|
||||
" RequestsPutTool(requests_wrapper=TextRequestsWrapper(headers={}, aiosession=None, auth=None, response_content_type='text', verify=True), allow_dangerous_requests=True),\n",
|
||||
" RequestsDeleteTool(requests_wrapper=TextRequestsWrapper(headers={}, aiosession=None, auth=None, response_content_type='text', verify=True), allow_dangerous_requests=True)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tools = toolkit.get_tools()\n",
|
||||
"\n",
|
||||
"tools"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a21a6ca4-d650-4b7d-a944-1a8771b5293a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"- [RequestsGetTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.requests.tool.RequestsGetTool.html)\n",
|
||||
"- [RequestsPostTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.requests.tool.RequestsPostTool.html)\n",
|
||||
"- [RequestsPatchTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.requests.tool.RequestsPatchTool.html)\n",
|
||||
"- [RequestsPutTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.requests.tool.RequestsPutTool.html)\n",
|
||||
"- [RequestsDeleteTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.requests.tool.RequestsDeleteTool.html)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e2dbb304-abf2-472a-9130-f03150a40549",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use within an agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "db062da7-f22c-4f36-9df8-1da96c9f7538",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\")\n",
|
||||
"\n",
|
||||
"system_message = \"\"\"\n",
|
||||
"You have access to an API to help answer user queries.\n",
|
||||
"Here is documentation on the API:\n",
|
||||
"{api_spec}\n",
|
||||
"\"\"\".format(api_spec=api_spec)\n",
|
||||
"\n",
|
||||
"agent_executor = create_react_agent(llm, tools, state_modifier=system_message)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "c1e47be9-374a-457c-928a-48f02b5530e3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"Fetch the top two posts. What are their titles?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" requests_get (call_RV2SOyzCnV5h2sm4WPgG8fND)\n",
|
||||
" Call ID: call_RV2SOyzCnV5h2sm4WPgG8fND\n",
|
||||
" Args:\n",
|
||||
" url: https://jsonplaceholder.typicode.com/posts?_limit=2\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: requests_get\n",
|
||||
"\n",
|
||||
"[\n",
|
||||
" {\n",
|
||||
" \"userId\": 1,\n",
|
||||
" \"id\": 1,\n",
|
||||
" \"title\": \"sunt aut facere repellat provident occaecati excepturi optio reprehenderit\",\n",
|
||||
" \"body\": \"quia et suscipit\\nsuscipit recusandae consequuntur expedita et cum\\nreprehenderit molestiae ut ut quas totam\\nnostrum rerum est autem sunt rem eveniet architecto\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"userId\": 1,\n",
|
||||
" \"id\": 2,\n",
|
||||
" \"title\": \"qui est esse\",\n",
|
||||
" \"body\": \"est rerum tempore vitae\\nsequi sint nihil reprehenderit dolor beatae ea dolores neque\\nfugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis\\nqui aperiam non debitis possimus qui neque nisi nulla\"\n",
|
||||
" }\n",
|
||||
"]\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\n",
|
||||
"The titles of the top two posts are:\n",
|
||||
"1. \"sunt aut facere repellat provident occaecati excepturi optio reprehenderit\"\n",
|
||||
"2. \"qui est esse\"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"example_query = \"Fetch the top two posts. What are their titles?\"\n",
|
||||
"\n",
|
||||
"events = agent_executor.stream(\n",
|
||||
" {\"messages\": [(\"user\", example_query)]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
")\n",
|
||||
"for event in events:\n",
|
||||
" event[\"messages\"][-1].pretty_print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "01ec4886-de3d-4fda-bd05-e3f254810969",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all API toolkit features and configurations head to the API reference for [RequestsToolkit](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.openapi.toolkit.RequestsToolkit.html)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -5,7 +5,7 @@
|
||||
"id": "e49f1e0d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Robocorp Toolkit\n",
|
||||
"# Robocorp\n",
|
||||
"\n",
|
||||
"This notebook covers how to get started with [Robocorp Action Server](https://github.com/robocorp/robocorp) action toolkit and LangChain.\n",
|
||||
"\n",
|
||||
@@ -4,7 +4,16 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Slack Toolkit\n",
|
||||
"---\n",
|
||||
"sidebar_label: Slack\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# SlackToolkit\n",
|
||||
"\n",
|
||||
"This will help you getting started with the Slack [toolkit](/docs/concepts/#toolkits). For detailed documentation of all SlackToolkit features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.slack.toolkit.SlackToolkit.html).\n",
|
||||
"\n",
|
||||
419
docs/docs/integrations/toolkits/spark.ipynb
Normal file
419
docs/docs/integrations/toolkits/spark.ipynb
Normal file
@@ -0,0 +1,419 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Spark Dataframe\n",
|
||||
"\n",
|
||||
"This notebook shows how to use agents to interact with a `Spark DataFrame` and `Spark Connect`. It is mostly optimized for question answering.\n",
|
||||
"\n",
|
||||
"**NOTE: this agent calls the Python agent under the hood, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"...input your openai api key here...\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## `Spark DataFrame` example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"23/05/15 20:33:10 WARN Utils: Your hostname, Mikes-Mac-mini.local resolves to a loopback address: 127.0.0.1; using 192.168.68.115 instead (on interface en1)\n",
|
||||
"23/05/15 20:33:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address\n",
|
||||
"Setting default log level to \"WARN\".\n",
|
||||
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
|
||||
"23/05/15 20:33:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"+-----------+--------+------+--------------------+------+----+-----+-----+----------------+-------+-----+--------+\n",
|
||||
"|PassengerId|Survived|Pclass| Name| Sex| Age|SibSp|Parch| Ticket| Fare|Cabin|Embarked|\n",
|
||||
"+-----------+--------+------+--------------------+------+----+-----+-----+----------------+-------+-----+--------+\n",
|
||||
"| 1| 0| 3|Braund, Mr. Owen ...| male|22.0| 1| 0| A/5 21171| 7.25| null| S|\n",
|
||||
"| 2| 1| 1|Cumings, Mrs. Joh...|female|38.0| 1| 0| PC 17599|71.2833| C85| C|\n",
|
||||
"| 3| 1| 3|Heikkinen, Miss. ...|female|26.0| 0| 0|STON/O2. 3101282| 7.925| null| S|\n",
|
||||
"| 4| 1| 1|Futrelle, Mrs. Ja...|female|35.0| 1| 0| 113803| 53.1| C123| S|\n",
|
||||
"| 5| 0| 3|Allen, Mr. Willia...| male|35.0| 0| 0| 373450| 8.05| null| S|\n",
|
||||
"| 6| 0| 3| Moran, Mr. James| male|null| 0| 0| 330877| 8.4583| null| Q|\n",
|
||||
"| 7| 0| 1|McCarthy, Mr. Tim...| male|54.0| 0| 0| 17463|51.8625| E46| S|\n",
|
||||
"| 8| 0| 3|Palsson, Master. ...| male| 2.0| 3| 1| 349909| 21.075| null| S|\n",
|
||||
"| 9| 1| 3|Johnson, Mrs. Osc...|female|27.0| 0| 2| 347742|11.1333| null| S|\n",
|
||||
"| 10| 1| 2|Nasser, Mrs. Nich...|female|14.0| 1| 0| 237736|30.0708| null| C|\n",
|
||||
"| 11| 1| 3|Sandstrom, Miss. ...|female| 4.0| 1| 1| PP 9549| 16.7| G6| S|\n",
|
||||
"| 12| 1| 1|Bonnell, Miss. El...|female|58.0| 0| 0| 113783| 26.55| C103| S|\n",
|
||||
"| 13| 0| 3|Saundercock, Mr. ...| male|20.0| 0| 0| A/5. 2151| 8.05| null| S|\n",
|
||||
"| 14| 0| 3|Andersson, Mr. An...| male|39.0| 1| 5| 347082| 31.275| null| S|\n",
|
||||
"| 15| 0| 3|Vestrom, Miss. Hu...|female|14.0| 0| 0| 350406| 7.8542| null| S|\n",
|
||||
"| 16| 1| 2|Hewlett, Mrs. (Ma...|female|55.0| 0| 0| 248706| 16.0| null| S|\n",
|
||||
"| 17| 0| 3|Rice, Master. Eugene| male| 2.0| 4| 1| 382652| 29.125| null| Q|\n",
|
||||
"| 18| 1| 2|Williams, Mr. Cha...| male|null| 0| 0| 244373| 13.0| null| S|\n",
|
||||
"| 19| 0| 3|Vander Planke, Mr...|female|31.0| 1| 0| 345763| 18.0| null| S|\n",
|
||||
"| 20| 1| 3|Masselmani, Mrs. ...|female|null| 0| 0| 2649| 7.225| null| C|\n",
|
||||
"+-----------+--------+------+--------------------+------+----+-----+-----+----------------+-------+-----+--------+\n",
|
||||
"only showing top 20 rows\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_experimental.agents.agent_toolkits import create_spark_dataframe_agent\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from pyspark.sql import SparkSession\n",
|
||||
"\n",
|
||||
"spark = SparkSession.builder.getOrCreate()\n",
|
||||
"csv_file_path = \"titanic.csv\"\n",
|
||||
"df = spark.read.csv(csv_file_path, header=True, inferSchema=True)\n",
|
||||
"df.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_spark_dataframe_agent(llm=OpenAI(temperature=0), df=df, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to find out how many rows are in the dataframe\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: df.count()\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m891\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: There are 891 rows in the dataframe.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 891 rows in the dataframe.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"how many rows are there?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to find out how many people have more than 3 siblings\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: df.filter(df.SibSp > 3).count()\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m30\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 30 people have more than 3 siblings.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'30 people have more than 3 siblings.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"how many people have more than 3 siblings\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to get the average age first\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: df.agg({\"Age\": \"mean\"}).collect()[0][0]\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m29.69911764705882\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now have the average age, I need to get the square root\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: math.sqrt(29.69911764705882)\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mname 'math' is not defined\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to import math first\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: import math\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now have the math library imported, I can get the square root\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: math.sqrt(29.69911764705882)\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m5.449689683556195\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 5.449689683556195\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'5.449689683556195'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"whats the square root of the average age?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"spark.stop()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## `Spark Connect` example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# in apache-spark root directory. (tested here with \"spark-3.4.0-bin-hadoop3 and later\")\n",
|
||||
"# To launch Spark with support for Spark Connect sessions, run the start-connect-server.sh script.\n",
|
||||
"!./sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"23/05/08 10:06:09 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from pyspark.sql import SparkSession\n",
|
||||
"\n",
|
||||
"# Now that the Spark server is running, we can connect to it remotely using Spark Connect. We do this by\n",
|
||||
"# creating a remote Spark session on the client where our application runs. Before we can do that, we need\n",
|
||||
"# to make sure to stop the existing regular Spark session because it cannot coexist with the remote\n",
|
||||
"# Spark Connect session we are about to create.\n",
|
||||
"SparkSession.builder.master(\"local[*]\").getOrCreate().stop()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# The command we used above to launch the server configured Spark to run as localhost:15002.\n",
|
||||
"# So now we can create a remote Spark session on the client using the following command.\n",
|
||||
"spark = SparkSession.builder.remote(\"sc://localhost:15002\").getOrCreate()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"+-----------+--------+------+--------------------+------+----+-----+-----+----------------+-------+-----+--------+\n",
|
||||
"|PassengerId|Survived|Pclass| Name| Sex| Age|SibSp|Parch| Ticket| Fare|Cabin|Embarked|\n",
|
||||
"+-----------+--------+------+--------------------+------+----+-----+-----+----------------+-------+-----+--------+\n",
|
||||
"| 1| 0| 3|Braund, Mr. Owen ...| male|22.0| 1| 0| A/5 21171| 7.25| null| S|\n",
|
||||
"| 2| 1| 1|Cumings, Mrs. Joh...|female|38.0| 1| 0| PC 17599|71.2833| C85| C|\n",
|
||||
"| 3| 1| 3|Heikkinen, Miss. ...|female|26.0| 0| 0|STON/O2. 3101282| 7.925| null| S|\n",
|
||||
"| 4| 1| 1|Futrelle, Mrs. Ja...|female|35.0| 1| 0| 113803| 53.1| C123| S|\n",
|
||||
"| 5| 0| 3|Allen, Mr. Willia...| male|35.0| 0| 0| 373450| 8.05| null| S|\n",
|
||||
"| 6| 0| 3| Moran, Mr. James| male|null| 0| 0| 330877| 8.4583| null| Q|\n",
|
||||
"| 7| 0| 1|McCarthy, Mr. Tim...| male|54.0| 0| 0| 17463|51.8625| E46| S|\n",
|
||||
"| 8| 0| 3|Palsson, Master. ...| male| 2.0| 3| 1| 349909| 21.075| null| S|\n",
|
||||
"| 9| 1| 3|Johnson, Mrs. Osc...|female|27.0| 0| 2| 347742|11.1333| null| S|\n",
|
||||
"| 10| 1| 2|Nasser, Mrs. Nich...|female|14.0| 1| 0| 237736|30.0708| null| C|\n",
|
||||
"| 11| 1| 3|Sandstrom, Miss. ...|female| 4.0| 1| 1| PP 9549| 16.7| G6| S|\n",
|
||||
"| 12| 1| 1|Bonnell, Miss. El...|female|58.0| 0| 0| 113783| 26.55| C103| S|\n",
|
||||
"| 13| 0| 3|Saundercock, Mr. ...| male|20.0| 0| 0| A/5. 2151| 8.05| null| S|\n",
|
||||
"| 14| 0| 3|Andersson, Mr. An...| male|39.0| 1| 5| 347082| 31.275| null| S|\n",
|
||||
"| 15| 0| 3|Vestrom, Miss. Hu...|female|14.0| 0| 0| 350406| 7.8542| null| S|\n",
|
||||
"| 16| 1| 2|Hewlett, Mrs. (Ma...|female|55.0| 0| 0| 248706| 16.0| null| S|\n",
|
||||
"| 17| 0| 3|Rice, Master. Eugene| male| 2.0| 4| 1| 382652| 29.125| null| Q|\n",
|
||||
"| 18| 1| 2|Williams, Mr. Cha...| male|null| 0| 0| 244373| 13.0| null| S|\n",
|
||||
"| 19| 0| 3|Vander Planke, Mr...|female|31.0| 1| 0| 345763| 18.0| null| S|\n",
|
||||
"| 20| 1| 3|Masselmani, Mrs. ...|female|null| 0| 0| 2649| 7.225| null| C|\n",
|
||||
"+-----------+--------+------+--------------------+------+----+-----+-----+----------------+-------+-----+--------+\n",
|
||||
"only showing top 20 rows\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"csv_file_path = \"titanic.csv\"\n",
|
||||
"df = spark.read.csv(csv_file_path, header=True, inferSchema=True)\n",
|
||||
"df.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain_experimental.agents import create_spark_dataframe_agent\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"...input your openai api key here...\"\n",
|
||||
"\n",
|
||||
"agent = create_spark_dataframe_agent(llm=OpenAI(temperature=0), df=df, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Thought: I need to find the row with the highest fare\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: df.sort(df.Fare.desc()).first()\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mRow(PassengerId=259, Survived=1, Pclass=1, Name='Ward, Miss. Anna', Sex='female', Age=35.0, SibSp=0, Parch=0, Ticket='PC 17755', Fare=512.3292, Cabin=None, Embarked='C')\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the name of the person who bought the most expensive ticket\n",
|
||||
"Final Answer: Miss. Anna Ward\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Miss. Anna Ward'"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"\"\"\n",
|
||||
"who bought the most expensive ticket?\n",
|
||||
"You can find all supported function types in https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/dataframe\n",
|
||||
"\"\"\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"spark.stop()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -4,9 +4,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Spark SQL Toolkit\n",
|
||||
"# Spark SQL\n",
|
||||
"\n",
|
||||
"This notebook shows how to use agents to interact with `Spark SQL`. Similar to [SQL Database Agent](/docs/integrations/tools/sql_database), it is designed to address general inquiries about `Spark SQL` and facilitate error recovery.\n",
|
||||
"This notebook shows how to use agents to interact with `Spark SQL`. Similar to [SQL Database Agent](/docs/integrations/toolkits/sql_database), it is designed to address general inquiries about `Spark SQL` and facilitate error recovery.\n",
|
||||
"\n",
|
||||
"**NOTE: Note that, as this agent is in active development, all answers might not be correct. Additionally, it is not guaranteed that the agent won't perform DML statements on your Spark cluster given certain questions. Be careful running it on sensitive data!**"
|
||||
]
|
||||
624
docs/docs/integrations/toolkits/sql_database.ipynb
Normal file
624
docs/docs/integrations/toolkits/sql_database.ipynb
Normal file
@@ -0,0 +1,624 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "757e7780-b89a-4a87-b10c-cfa42337a8e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_label: SQLDatabaseToolkit\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0e499e90-7a6d-4fab-8aab-31a4df417601",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# SQLDatabaseToolkit\n",
|
||||
"\n",
|
||||
"This will help you getting started with the SQL Database [toolkit](/docs/concepts/#toolkits). For detailed documentation of all `SQLDatabaseToolkit` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.html).\n",
|
||||
"\n",
|
||||
"Tools within the `SQLDatabaseToolkit` are designed to interact with a `SQL` database. \n",
|
||||
"\n",
|
||||
"A common application is to enable agents to answer questions using data in a relational database, potentially in an iterative fashion (e.g., recovering from errors).\n",
|
||||
"\n",
|
||||
"**⚠️ Security note ⚠️**\n",
|
||||
"\n",
|
||||
"Building Q&A systems of SQL databases requires executing model-generated SQL queries. There are inherent risks in doing this. Make sure that your database connection permissions are always scoped as narrowly as possible for your chain/agent's needs. This will mitigate though not eliminate the risks of building a model-driven system. For more on general security best practices, [see here](/docs/security).\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3de6e3be-1fd9-42a3-8564-8ca7dca11e1c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "31896b61-68d2-4b4d-be9d-b829eda327d1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"This toolkit lives in the `langchain-community` package:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c4933e04-9120-4ccc-9ef7-369987823b0e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchain-community"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6ad08dbe-1642-448c-b58d-153810024375",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For demonstration purposes, we will access a prompt in the LangChain [Hub](https://smith.langchain.com/hub). We will also require `langgraph` to demonstrate the use of the toolkit with an agent. This is not required to use the toolkit."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f3dead45-9908-497d-a5a3-bce30642e88f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchainhub langgraph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "804533b1-2f16-497b-821b-c82d67fcf7b6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"The `SQLDatabaseToolkit` toolkit requires:\n",
|
||||
"\n",
|
||||
"- a [SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html) object;\n",
|
||||
"- a LLM or chat model (for instantiating the [QuerySQLCheckerTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.QuerySQLCheckerTool.html) tool).\n",
|
||||
"\n",
|
||||
"Below, we instantiate the toolkit with these objects. Let's first create a database object.\n",
|
||||
"\n",
|
||||
"This guide uses the example `Chinook` database based on [these instructions](https://database.guide/2-sample-databases-sqlite/).\n",
|
||||
"\n",
|
||||
"Below we will use the `requests` library to pull the `.sql` file and create an in-memory SQLite database. Note that this approach is lightweight, but ephemeral and not thread-safe. If you'd prefer, you can follow the instructions to save the file locally as `Chinook.db` and instantiate the database via `db = SQLDatabase.from_uri(\"sqlite:///Chinook.db\")`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "40d05f9b-5a8f-4307-8f8b-4153db0fdfa9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sqlite3\n",
|
||||
"\n",
|
||||
"import requests\n",
|
||||
"from langchain_community.utilities.sql_database import SQLDatabase\n",
|
||||
"from sqlalchemy import create_engine\n",
|
||||
"from sqlalchemy.pool import StaticPool\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_engine_for_chinook_db():\n",
|
||||
" \"\"\"Pull sql file, populate in-memory database, and create engine.\"\"\"\n",
|
||||
" url = \"https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql\"\n",
|
||||
" response = requests.get(url)\n",
|
||||
" sql_script = response.text\n",
|
||||
"\n",
|
||||
" connection = sqlite3.connect(\":memory:\", check_same_thread=False)\n",
|
||||
" connection.executescript(sql_script)\n",
|
||||
" return create_engine(\n",
|
||||
" \"sqlite://\",\n",
|
||||
" creator=lambda: connection,\n",
|
||||
" poolclass=StaticPool,\n",
|
||||
" connect_args={\"check_same_thread\": False},\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"engine = get_engine_for_chinook_db()\n",
|
||||
"\n",
|
||||
"db = SQLDatabase(engine)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b9a6326-78fd-4c42-a1cb-4316619ac449",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will also need a LLM or chat model:\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
|
||||
"\n",
|
||||
"<ChatModelTabs customVarName=\"llm\" />\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "cc6e6108-83d9-404f-8f31-474c2fbf5f6c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "77925e72-4730-43c3-8726-d68cedf635f4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can now instantiate the toolkit:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "42bd5a41-672a-4a53-b70a-2f0c0555758c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit\n",
|
||||
"\n",
|
||||
"toolkit = SQLDatabaseToolkit(db=db, llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b2f882cf-4156-4a9f-a714-db97ec8ccc37",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"\n",
|
||||
"View available tools:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "a18c3e69-bee0-4f5d-813e-eeb540f41b98",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[QuerySQLDataBaseTool(description=\"Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.\", db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x105e02860>),\n",
|
||||
" InfoSQLDatabaseTool(description='Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x105e02860>),\n",
|
||||
" ListSQLDatabaseTool(db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x105e02860>),\n",
|
||||
" QuerySQLCheckerTool(description='Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x105e02860>, llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x1148a97b0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x1148aaec0>, temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy=''), llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['dialect', 'query'], template='\\n{query}\\nDouble check the {dialect} query above for common mistakes, including:\\n- Using NOT IN with NULL values\\n- Using UNION when UNION ALL should have been used\\n- Using BETWEEN for exclusive ranges\\n- Data type mismatch in predicates\\n- Properly quoting identifiers\\n- Using the correct number of arguments for functions\\n- Casting to the correct data type\\n- Using the proper columns for joins\\n\\nIf there are any of the above mistakes, rewrite the query. If there are no mistakes, just reproduce the original query.\\n\\nOutput the final SQL query only.\\n\\nSQL Query: '), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x1148a97b0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x1148aaec0>, temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy='')))]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"toolkit.get_tools()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5f5751e3-2e98-485f-8164-db8094039c25",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"API references:\n",
|
||||
"\n",
|
||||
"- [QuerySQLDataBaseTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.QuerySQLDataBaseTool.html)\n",
|
||||
"- [InfoSQLDatabaseTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.InfoSQLDatabaseTool.html)\n",
|
||||
"- [ListSQLDatabaseTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.ListSQLDatabaseTool.html)\n",
|
||||
"- [QuerySQLCheckerTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.QuerySQLCheckerTool.html)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c067e0ed-dcca-4dcc-81b2-a0eeb4fc2a9f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use within an agent\n",
|
||||
"\n",
|
||||
"Following the [SQL Q&A Tutorial](/docs/tutorials/sql_qa/#agents), below we equip a simple question-answering agent with the tools in our toolkit. First we pull a relevant prompt and populate it with its required parameters:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "eda12f8b-be90-4697-ac84-2ece9e2d1708",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"['dialect', 'top_k']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain import hub\n",
|
||||
"\n",
|
||||
"prompt_template = hub.pull(\"langchain-ai/sql-agent-system-prompt\")\n",
|
||||
"\n",
|
||||
"assert len(prompt_template.messages) == 1\n",
|
||||
"print(prompt_template.input_variables)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "3470ae96-e5e5-4717-a6d6-d7d28c7b7347",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_message = prompt_template.format(dialect=\"SQLite\", top_k=5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "97930c07-36d1-4137-94ae-fe5ac83ecc44",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We then instantiate the agent:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "48bca92c-9b4b-4d5c-bcce-1b239c9e901c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"agent_executor = create_react_agent(\n",
|
||||
" llm, toolkit.get_tools(), state_modifier=system_message\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "09fb1845-1105-4f41-98b4-24756452a3e3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And issue it a query:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "39e6d2bf-3194-4aba-854b-63faf919157b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"Which country's customers spent the most?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_list_tables (call_eiheSxiL0s90KE50XyBnBtJY)\n",
|
||||
" Call ID: call_eiheSxiL0s90KE50XyBnBtJY\n",
|
||||
" Args:\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_list_tables\n",
|
||||
"\n",
|
||||
"Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_schema (call_YKwGWt4UUVmxxY7vjjBDzFLJ)\n",
|
||||
" Call ID: call_YKwGWt4UUVmxxY7vjjBDzFLJ\n",
|
||||
" Args:\n",
|
||||
" table_names: Customer, Invoice, InvoiceLine\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_schema\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"Customer\" (\n",
|
||||
"\t\"CustomerId\" INTEGER NOT NULL, \n",
|
||||
"\t\"FirstName\" NVARCHAR(40) NOT NULL, \n",
|
||||
"\t\"LastName\" NVARCHAR(20) NOT NULL, \n",
|
||||
"\t\"Company\" NVARCHAR(80), \n",
|
||||
"\t\"Address\" NVARCHAR(70), \n",
|
||||
"\t\"City\" NVARCHAR(40), \n",
|
||||
"\t\"State\" NVARCHAR(40), \n",
|
||||
"\t\"Country\" NVARCHAR(40), \n",
|
||||
"\t\"PostalCode\" NVARCHAR(10), \n",
|
||||
"\t\"Phone\" NVARCHAR(24), \n",
|
||||
"\t\"Fax\" NVARCHAR(24), \n",
|
||||
"\t\"Email\" NVARCHAR(60) NOT NULL, \n",
|
||||
"\t\"SupportRepId\" INTEGER, \n",
|
||||
"\tPRIMARY KEY (\"CustomerId\"), \n",
|
||||
"\tFOREIGN KEY(\"SupportRepId\") REFERENCES \"Employee\" (\"EmployeeId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from Customer table:\n",
|
||||
"CustomerId\tFirstName\tLastName\tCompany\tAddress\tCity\tState\tCountry\tPostalCode\tPhone\tFax\tEmail\tSupportRepId\n",
|
||||
"1\tLuís\tGonçalves\tEmbraer - Empresa Brasileira de Aeronáutica S.A.\tAv. Brigadeiro Faria Lima, 2170\tSão José dos Campos\tSP\tBrazil\t12227-000\t+55 (12) 3923-5555\t+55 (12) 3923-5566\tluisg@embraer.com.br\t3\n",
|
||||
"2\tLeonie\tKöhler\tNone\tTheodor-Heuss-Straße 34\tStuttgart\tNone\tGermany\t70174\t+49 0711 2842222\tNone\tleonekohler@surfeu.de\t5\n",
|
||||
"3\tFrançois\tTremblay\tNone\t1498 rue Bélanger\tMontréal\tQC\tCanada\tH2G 1A7\t+1 (514) 721-4711\tNone\tftremblay@gmail.com\t3\n",
|
||||
"*/\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"Invoice\" (\n",
|
||||
"\t\"InvoiceId\" INTEGER NOT NULL, \n",
|
||||
"\t\"CustomerId\" INTEGER NOT NULL, \n",
|
||||
"\t\"InvoiceDate\" DATETIME NOT NULL, \n",
|
||||
"\t\"BillingAddress\" NVARCHAR(70), \n",
|
||||
"\t\"BillingCity\" NVARCHAR(40), \n",
|
||||
"\t\"BillingState\" NVARCHAR(40), \n",
|
||||
"\t\"BillingCountry\" NVARCHAR(40), \n",
|
||||
"\t\"BillingPostalCode\" NVARCHAR(10), \n",
|
||||
"\t\"Total\" NUMERIC(10, 2) NOT NULL, \n",
|
||||
"\tPRIMARY KEY (\"InvoiceId\"), \n",
|
||||
"\tFOREIGN KEY(\"CustomerId\") REFERENCES \"Customer\" (\"CustomerId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from Invoice table:\n",
|
||||
"InvoiceId\tCustomerId\tInvoiceDate\tBillingAddress\tBillingCity\tBillingState\tBillingCountry\tBillingPostalCode\tTotal\n",
|
||||
"1\t2\t2021-01-01 00:00:00\tTheodor-Heuss-Straße 34\tStuttgart\tNone\tGermany\t70174\t1.98\n",
|
||||
"2\t4\t2021-01-02 00:00:00\tUllevålsveien 14\tOslo\tNone\tNorway\t0171\t3.96\n",
|
||||
"3\t8\t2021-01-03 00:00:00\tGrétrystraat 63\tBrussels\tNone\tBelgium\t1000\t5.94\n",
|
||||
"*/\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"InvoiceLine\" (\n",
|
||||
"\t\"InvoiceLineId\" INTEGER NOT NULL, \n",
|
||||
"\t\"InvoiceId\" INTEGER NOT NULL, \n",
|
||||
"\t\"TrackId\" INTEGER NOT NULL, \n",
|
||||
"\t\"UnitPrice\" NUMERIC(10, 2) NOT NULL, \n",
|
||||
"\t\"Quantity\" INTEGER NOT NULL, \n",
|
||||
"\tPRIMARY KEY (\"InvoiceLineId\"), \n",
|
||||
"\tFOREIGN KEY(\"TrackId\") REFERENCES \"Track\" (\"TrackId\"), \n",
|
||||
"\tFOREIGN KEY(\"InvoiceId\") REFERENCES \"Invoice\" (\"InvoiceId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from InvoiceLine table:\n",
|
||||
"InvoiceLineId\tInvoiceId\tTrackId\tUnitPrice\tQuantity\n",
|
||||
"1\t1\t2\t0.99\t1\n",
|
||||
"2\t1\t4\t0.99\t1\n",
|
||||
"3\t2\t6\t0.99\t1\n",
|
||||
"*/\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_query (call_7WBDcMxl1h7MnI05njx1q8V9)\n",
|
||||
" Call ID: call_7WBDcMxl1h7MnI05njx1q8V9\n",
|
||||
" Args:\n",
|
||||
" query: SELECT c.Country, SUM(i.Total) AS TotalSpent FROM Customer c JOIN Invoice i ON c.CustomerId = i.CustomerId GROUP BY c.Country ORDER BY TotalSpent DESC LIMIT 1\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_query\n",
|
||||
"\n",
|
||||
"[('USA', 523.0600000000003)]\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\n",
|
||||
"Customers from the USA spent the most, with a total amount spent of $523.06.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"example_query = \"Which country's customers spent the most?\"\n",
|
||||
"\n",
|
||||
"events = agent_executor.stream(\n",
|
||||
" {\"messages\": [(\"user\", example_query)]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
")\n",
|
||||
"for event in events:\n",
|
||||
" event[\"messages\"][-1].pretty_print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "adbf3d8d-7570-45a5-950f-ce84db5145ab",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also observe the agent recover from an error:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "23c1235c-6d18-43e4-98ab-85b426b53d94",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"Who are the top 3 best selling artists?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_query (call_9F6Bp2vwsDkeLW6FsJFqLiet)\n",
|
||||
" Call ID: call_9F6Bp2vwsDkeLW6FsJFqLiet\n",
|
||||
" Args:\n",
|
||||
" query: SELECT artist_name, SUM(quantity) AS total_sold FROM sales GROUP BY artist_name ORDER BY total_sold DESC LIMIT 3\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_query\n",
|
||||
"\n",
|
||||
"Error: (sqlite3.OperationalError) no such table: sales\n",
|
||||
"[SQL: SELECT artist_name, SUM(quantity) AS total_sold FROM sales GROUP BY artist_name ORDER BY total_sold DESC LIMIT 3]\n",
|
||||
"(Background on this error at: https://sqlalche.me/e/20/e3q8)\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_list_tables (call_Gx5adzWnrBDIIxzUDzsn83zO)\n",
|
||||
" Call ID: call_Gx5adzWnrBDIIxzUDzsn83zO\n",
|
||||
" Args:\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_list_tables\n",
|
||||
"\n",
|
||||
"Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_schema (call_ftywrZgEgGWLrnk9dYC0xtZv)\n",
|
||||
" Call ID: call_ftywrZgEgGWLrnk9dYC0xtZv\n",
|
||||
" Args:\n",
|
||||
" table_names: Artist, Album, InvoiceLine\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_schema\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"Album\" (\n",
|
||||
"\t\"AlbumId\" INTEGER NOT NULL, \n",
|
||||
"\t\"Title\" NVARCHAR(160) NOT NULL, \n",
|
||||
"\t\"ArtistId\" INTEGER NOT NULL, \n",
|
||||
"\tPRIMARY KEY (\"AlbumId\"), \n",
|
||||
"\tFOREIGN KEY(\"ArtistId\") REFERENCES \"Artist\" (\"ArtistId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from Album table:\n",
|
||||
"AlbumId\tTitle\tArtistId\n",
|
||||
"1\tFor Those About To Rock We Salute You\t1\n",
|
||||
"2\tBalls to the Wall\t2\n",
|
||||
"3\tRestless and Wild\t2\n",
|
||||
"*/\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"Artist\" (\n",
|
||||
"\t\"ArtistId\" INTEGER NOT NULL, \n",
|
||||
"\t\"Name\" NVARCHAR(120), \n",
|
||||
"\tPRIMARY KEY (\"ArtistId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from Artist table:\n",
|
||||
"ArtistId\tName\n",
|
||||
"1\tAC/DC\n",
|
||||
"2\tAccept\n",
|
||||
"3\tAerosmith\n",
|
||||
"*/\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"InvoiceLine\" (\n",
|
||||
"\t\"InvoiceLineId\" INTEGER NOT NULL, \n",
|
||||
"\t\"InvoiceId\" INTEGER NOT NULL, \n",
|
||||
"\t\"TrackId\" INTEGER NOT NULL, \n",
|
||||
"\t\"UnitPrice\" NUMERIC(10, 2) NOT NULL, \n",
|
||||
"\t\"Quantity\" INTEGER NOT NULL, \n",
|
||||
"\tPRIMARY KEY (\"InvoiceLineId\"), \n",
|
||||
"\tFOREIGN KEY(\"TrackId\") REFERENCES \"Track\" (\"TrackId\"), \n",
|
||||
"\tFOREIGN KEY(\"InvoiceId\") REFERENCES \"Invoice\" (\"InvoiceId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from InvoiceLine table:\n",
|
||||
"InvoiceLineId\tInvoiceId\tTrackId\tUnitPrice\tQuantity\n",
|
||||
"1\t1\t2\t0.99\t1\n",
|
||||
"2\t1\t4\t0.99\t1\n",
|
||||
"3\t2\t6\t0.99\t1\n",
|
||||
"*/\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_query (call_i6n3lmS7E2ZivN758VOayTiy)\n",
|
||||
" Call ID: call_i6n3lmS7E2ZivN758VOayTiy\n",
|
||||
" Args:\n",
|
||||
" query: SELECT Artist.Name AS artist_name, SUM(InvoiceLine.Quantity) AS total_sold FROM Artist JOIN Album ON Artist.ArtistId = Album.ArtistId JOIN Track ON Album.AlbumId = Track.AlbumId JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY total_sold DESC LIMIT 3\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_query\n",
|
||||
"\n",
|
||||
"[('Iron Maiden', 140), ('U2', 107), ('Metallica', 91)]\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\n",
|
||||
"The top 3 best selling artists are:\n",
|
||||
"1. Iron Maiden - 140 units sold\n",
|
||||
"2. U2 - 107 units sold\n",
|
||||
"3. Metallica - 91 units sold\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"example_query = \"Who are the top 3 best selling artists?\"\n",
|
||||
"\n",
|
||||
"events = agent_executor.stream(\n",
|
||||
" {\"messages\": [(\"user\", example_query)]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
")\n",
|
||||
"for event in events:\n",
|
||||
" event[\"messages\"][-1].pretty_print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73521f1b-be03-44e6-8b27-a9a46ae8e962",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specific functionality\n",
|
||||
"\n",
|
||||
"`SQLDatabaseToolkit` implements a [.get_context](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.html#langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.get_context) method as a convenience for use in prompts or other contexts.\n",
|
||||
"\n",
|
||||
"**⚠️ Disclaimer ⚠️** : The agent may generate insert/update/delete queries. When this is not expected, use a custom prompt or create a SQL users without write permissions.\n",
|
||||
"\n",
|
||||
"The final user might overload your SQL database by asking a simple question such as \"run the biggest query possible\". The generated query might look like:\n",
|
||||
"\n",
|
||||
"```sql\n",
|
||||
"SELECT * FROM \"public\".\"users\"\n",
|
||||
" JOIN \"public\".\"user_permissions\" ON \"public\".\"users\".id = \"public\".\"user_permissions\".user_id\n",
|
||||
" JOIN \"public\".\"projects\" ON \"public\".\"users\".id = \"public\".\"projects\".user_id\n",
|
||||
" JOIN \"public\".\"events\" ON \"public\".\"projects\".id = \"public\".\"events\".project_id;\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"For a transactional SQL database, if one of the table above contains millions of rows, the query might cause trouble to other applications using the same database.\n",
|
||||
"\n",
|
||||
"Most datawarehouse oriented databases support user-level quota, for limiting resource usage."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1aa8a7e3-87ca-4963-a224-0cbdc9d88714",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all SQLDatabaseToolkit features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.html)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Steam Toolkit\n",
|
||||
"# Steam Game Recommendation & Game Details\n",
|
||||
"\n",
|
||||
">[Steam (Wikipedia)](https://en.wikipedia.org/wiki/Steam_(service)) is a video game digital distribution service and storefront developed by `Valve Corporation`. It provides game updates automatically for Valve's games, and expanded to distributing third-party titles. `Steam` offers various features, like game server matchmaking with Valve Anti-Cheat measures, social networking, and game streaming services.\n",
|
||||
"\n",
|
||||
752
docs/docs/integrations/toolkits/xorbits.ipynb
Normal file
752
docs/docs/integrations/toolkits/xorbits.ipynb
Normal file
@@ -0,0 +1,752 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Xorbits"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook shows how to use agents to interact with [Xorbits Pandas](https://doc.xorbits.io/en/latest/reference/pandas/index.html) dataframe and [Xorbits Numpy](https://doc.xorbits.io/en/latest/reference/numpy/index.html) ndarray. It is mostly optimized for question answering.\n",
|
||||
"\n",
|
||||
"**NOTE: this agent calls the `Python` agent under the hood, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pandas examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-07-13T08:06:33.955439Z",
|
||||
"start_time": "2023-07-13T08:06:33.767539500Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import xorbits.pandas as pd\n",
|
||||
"from langchain_experimental.agents.agent_toolkits import create_xorbits_agent\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-07-13T08:06:33.955439Z",
|
||||
"start_time": "2023-07-13T08:06:33.767539500Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "05b7c067b1114ce9a8aef4a58a5d5fef",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data = pd.read_csv(\"titanic.csv\")\n",
|
||||
"agent = create_xorbits_agent(OpenAI(temperature=0), data, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-07-13T08:11:06.622471100Z",
|
||||
"start_time": "2023-07-13T08:11:03.183042Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to count the number of rows and columns\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data.shape\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m(891, 12)\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: There are 891 rows and 12 columns.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 891 rows and 12 columns.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"How many rows and columns are there?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-07-13T08:11:23.189275300Z",
|
||||
"start_time": "2023-07-13T08:11:11.029030900Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "8c63d745a7eb41a484043a5dba357997",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to count the number of people in pclass 1\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data[data['Pclass'] == 1].shape[0]\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m216\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: There are 216 people in pclass 1.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 216 people in pclass 1.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"How many people are in pclass 1?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to calculate the mean age\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data['Age'].mean()\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "29af2e29f2d64a3397c212812adf0e9b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m29.69911764705882\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The mean age is 29.69911764705882.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The mean age is 29.69911764705882.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"whats the mean age?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to group the data by sex and then find the average age for each group\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data.groupby('Sex')['Age'].mean()\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "c3d28625c35946fd91ebc2a47f8d8c5b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mSex\n",
|
||||
"female 27.915709\n",
|
||||
"male 30.726645\n",
|
||||
"Name: Age, dtype: float64\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the average age for each group\n",
|
||||
"Final Answer: The average age for female passengers is 27.92 and the average age for male passengers is 30.73.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The average age for female passengers is 27.92 and the average age for male passengers is 30.73.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Group the data by sex and find the average age for each group\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "c72aab63b20d47599f4f9806f6887a69",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to filter the dataframe to get the desired result\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data[(data['Age'] > 30) & (data['Fare'] > 30) & (data['Fare'] < 50) & ((data['Pclass'] == 1) | (data['Pclass'] == 2))].shape[0]\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m20\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 20\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'20'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"Show the number of people whose age is greater than 30 and fare is between 30 and 50 , and pclass is either 1 or 2\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Numpy examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "fa8baf315a0c41c89392edc4a24b76f5",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import xorbits.numpy as np\n",
|
||||
"from langchain_experimental.agents.agent_toolkits import create_xorbits_agent\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
"arr = np.array([1, 2, 3, 4, 5, 6])\n",
|
||||
"agent = create_xorbits_agent(OpenAI(temperature=0), arr, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to find out the shape of the array\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data.shape\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m(6,)\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The shape of the array is (6,).\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The shape of the array is (6,).'"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Give the shape of the array \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to access the 2nd element of the array\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data[1]\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "64efcc74f81f404eb0a7d3f0326cd8b3",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m2\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 2\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'2'"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Give the 2nd element of the array \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to reshape the array and then transpose it\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: np.reshape(data, (2,3)).T\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "fce51acf6fb347c0b400da67c6750534",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m[[1 4]\n",
|
||||
" [2 5]\n",
|
||||
" [3 6]]\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The reshaped and transposed array is [[1 4], [2 5], [3 6]].\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The reshaped and transposed array is [[1 4], [2 5], [3 6]].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"Reshape the array into a 2-dimensional array with 2 rows and 3 columns, and then transpose it\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to reshape the array and then sum it\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: np.sum(np.reshape(data, (3,2)), axis=0)\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "27fd4a0bbf694936bc41a6991064dec2",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m[ 9 12]\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The sum of the array along the first axis is [9, 12].\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The sum of the array along the first axis is [9, 12].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"Reshape the array into a 2-dimensional array with 3 rows and 2 columns and sum the array along the first axis\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "a591b6d7913f45cba98d2f3b71a5120a",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
|
||||
"agent = create_xorbits_agent(OpenAI(temperature=0), arr, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to use the numpy covariance function\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: np.cov(data)\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "5fe40f83cfae48d0919c147627b5839f",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m[[1. 1. 1.]\n",
|
||||
" [1. 1. 1.]\n",
|
||||
" [1. 1. 1.]]\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The covariance matrix is [[1. 1. 1.], [1. 1. 1.], [1. 1. 1.]].\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The covariance matrix is [[1. 1. 1.], [1. 1. 1.], [1. 1. 1.]].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"calculate the covariance matrix\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to use the SVD function\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: U, S, V = np.linalg.svd(data)\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now have the U matrix\n",
|
||||
"Final Answer: U = [[-0.70710678 -0.70710678]\n",
|
||||
" [-0.70710678 0.70710678]]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'U = [[-0.70710678 -0.70710678]\\n [-0.70710678 0.70710678]]'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"compute the U of Singular Value Decomposition of the matrix\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
170
docs/docs/integrations/tools/apify.ipynb
Normal file
170
docs/docs/integrations/tools/apify.ipynb
Normal file
@@ -0,0 +1,170 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Apify\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the [Apify integration](/docs/integrations/providers/apify) for LangChain.\n",
|
||||
"\n",
|
||||
"[Apify](https://apify.com) is a cloud platform for web scraping and data extraction,\n",
|
||||
"which provides an [ecosystem](https://apify.com/store) of more than a thousand\n",
|
||||
"ready-made apps called *Actors* for various web scraping, crawling, and data extraction use cases.\n",
|
||||
"For example, you can use it to extract Google Search results, Instagram and Facebook profiles, products from Amazon or Shopify, Google Maps reviews, etc. etc.\n",
|
||||
"\n",
|
||||
"In this example, we'll use the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor,\n",
|
||||
"which can deeply crawl websites such as documentation, knowledge bases, help centers, or blogs,\n",
|
||||
"and extract text content from the web pages. Then we feed the documents into a vector index and answer questions from it.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet apify-client langchain-community langchain-openai langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, import `ApifyWrapper` into your source code:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.indexes import VectorstoreIndexCreator\n",
|
||||
"from langchain_community.utilities import ApifyWrapper\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain_openai.embeddings import OpenAIEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Initialize it using your [Apify API token](https://docs.apify.com/platform/integrations/api#api-token) and for the purpose of this example, also with your OpenAI API key:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"Your OpenAI API key\"\n",
|
||||
"os.environ[\"APIFY_API_TOKEN\"] = \"Your Apify API token\"\n",
|
||||
"\n",
|
||||
"apify = ApifyWrapper()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Then run the Actor, wait for it to finish, and fetch its results from the Apify dataset into a LangChain document loader.\n",
|
||||
"\n",
|
||||
"Note that if you already have some results in an Apify dataset, you can load them directly using `ApifyDatasetLoader`, as shown in [this notebook](/docs/integrations/document_loaders/apify_dataset). In that notebook, you'll also find the explanation of the `dataset_mapping_function`, which is used to map fields from the Apify dataset records to LangChain `Document` fields."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = apify.call_actor(\n",
|
||||
" actor_id=\"apify/website-content-crawler\",\n",
|
||||
" run_input={\"startUrls\": [{\"url\": \"https://python.langchain.com\"}]},\n",
|
||||
" dataset_mapping_function=lambda item: Document(\n",
|
||||
" page_content=item[\"text\"] or \"\", metadata={\"source\": item[\"url\"]}\n",
|
||||
" ),\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Initialize the vector index from the crawled documents:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"index = VectorstoreIndexCreator(embedding=OpenAIEmbeddings()).from_loaders([loader])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And finally, query the vector index:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What is LangChain?\"\n",
|
||||
"result = index.query_with_sources(query, llm=OpenAI())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" LangChain is a standard interface through which you can interact with a variety of large language models (LLMs). It provides modules that can be used to build language model applications, and it also provides chains and agents with memory capabilities.\n",
|
||||
"\n",
|
||||
"https://python.langchain.com/en/latest/modules/models/llms.html, https://python.langchain.com/en/latest/getting_started/getting_started.html\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(result[\"answer\"])\n",
|
||||
"print(result[\"sources\"])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -9,7 +9,7 @@
|
||||
"Using this tool, you can integrate individual Connery Action into your LangChain agent.\n",
|
||||
"\n",
|
||||
"If you want to use more than one Connery Action in your agent,\n",
|
||||
"check out the [Connery Toolkit](/docs/integrations/tools/connery_toolkit) documentation.\n",
|
||||
"check out the [Connery Toolkit](/docs/integrations/toolkits/connery) documentation.\n",
|
||||
"\n",
|
||||
"## What is Connery?\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,303 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"# FinancialDatasets Toolkit\n",
|
||||
"\n",
|
||||
"The [financial datasets](https://financialdatasets.ai/) stock market API provides REST endpoints that let you get financial data for 16,000+ tickers spanning 30+ years.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"To use this toolkit, you need two API keys:\n",
|
||||
"\n",
|
||||
"`FINANCIAL_DATASETS_API_KEY`: Get it from [financialdatasets.ai](https://financialdatasets.ai/).\n",
|
||||
"`OPENAI_API_KEY`: Get it from [OpenAI](https://platform.openai.com/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"FINANCIAL_DATASETS_API_KEY\"] = getpass.getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"This toolkit lives in the `langchain-community` package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "652d6238-1f87-422a-b135-f5abbb8652fc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-community"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a38cde65-254d-4219-a441-068766c0d4b5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"Now we can instantiate our toolkit:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "cb09c344-1836-4e0c-acf8-11d13ac1dbae",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.agent_toolkits.financial_datasets.toolkit import (\n",
|
||||
" FinancialDatasetsToolkit,\n",
|
||||
")\n",
|
||||
"from langchain_community.utilities.financial_datasets import FinancialDatasetsAPIWrapper\n",
|
||||
"\n",
|
||||
"api_wrapper = FinancialDatasetsAPIWrapper(\n",
|
||||
" financial_datasets_api_key=os.environ[\"FINANCIAL_DATASETS_API_KEY\"]\n",
|
||||
")\n",
|
||||
"toolkit = FinancialDatasetsToolkit(api_wrapper=api_wrapper)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5c5f2839-4020-424e-9fc9-07777eede442",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"\n",
|
||||
"View available tools:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "51a60dbe-9f2e-4e04-bb62-23968f17164a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = toolkit.get_tools()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## Use within an agent\n",
|
||||
"\n",
|
||||
"Let's equip our agent with the FinancialDatasetsToolkit and ask financial questions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_prompt = \"\"\"\n",
|
||||
"You are an advanced financial analysis AI assistant equipped with specialized tools\n",
|
||||
"to access and analyze financial data. Your primary function is to help users with\n",
|
||||
"financial analysis by retrieving and interpreting income statements, balance sheets,\n",
|
||||
"and cash flow statements for publicly traded companies.\n",
|
||||
"\n",
|
||||
"You have access to the following tools from the FinancialDatasetsToolkit:\n",
|
||||
"\n",
|
||||
"1. Balance Sheets: Retrieves balance sheet data for a given ticker symbol.\n",
|
||||
"2. Income Statements: Fetches income statement data for a specified company.\n",
|
||||
"3. Cash Flow Statements: Accesses cash flow statement information for a particular ticker.\n",
|
||||
"\n",
|
||||
"Your capabilities include:\n",
|
||||
"\n",
|
||||
"1. Retrieving financial statements for any publicly traded company using its ticker symbol.\n",
|
||||
"2. Analyzing financial ratios and metrics based on the data from these statements.\n",
|
||||
"3. Comparing financial performance across different time periods (e.g., year-over-year or quarter-over-quarter).\n",
|
||||
"4. Identifying trends in a company's financial health and performance.\n",
|
||||
"5. Providing insights on a company's liquidity, solvency, profitability, and efficiency.\n",
|
||||
"6. Explaining complex financial concepts in simple terms.\n",
|
||||
"\n",
|
||||
"When responding to queries:\n",
|
||||
"\n",
|
||||
"1. Always specify which financial statement(s) you're using for your analysis.\n",
|
||||
"2. Provide context for the numbers you're referencing (e.g., fiscal year, quarter).\n",
|
||||
"3. Explain your reasoning and calculations clearly.\n",
|
||||
"4. If you need more information to provide a complete answer, ask for clarification.\n",
|
||||
"5. When appropriate, suggest additional analyses that might be helpful.\n",
|
||||
"\n",
|
||||
"Remember, your goal is to provide accurate, insightful financial analysis to\n",
|
||||
"help users make informed decisions. Always maintain a professional and objective tone in your responses.\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"Instantiate the LLM."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "310bf18e-6c9a-4072-b86e-47bc1fcca29d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.tools import tool\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(model=\"gpt-4o\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"Define a user query."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "23e11cc9-abd6-4855-a7eb-799f45ca01ae",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What was AAPL's revenue in 2023? What about it's total debt in Q1 2024?\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"Create the agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import AgentExecutor, create_tool_calling_agent\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", system_prompt),\n",
|
||||
" (\"human\", \"{input}\"),\n",
|
||||
" # Placeholders fill up a **list** of messages\n",
|
||||
" (\"placeholder\", \"{agent_scratchpad}\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"agent = create_tool_calling_agent(model, tools, prompt)\n",
|
||||
"agent_executor = AgentExecutor(agent=agent, tools=tools)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"Query the agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent_executor.invoke({\"input\": query})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `FinancialDatasetsToolkit` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.financial_datasets.toolkit.FinancialDatasetsToolkit.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -21,7 +21,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchain_google_community"
|
||||
"%pip install --upgrade --quiet langchain-community"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -44,8 +44,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.utilities import GoogleSearchAPIWrapper\n",
|
||||
"from langchain_core.tools import Tool\n",
|
||||
"from langchain_google_community import GoogleSearchAPIWrapper\n",
|
||||
"\n",
|
||||
"search = GoogleSearchAPIWrapper()\n",
|
||||
"\n",
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -7,7 +7,7 @@
|
||||
"id": "245a954a"
|
||||
},
|
||||
"source": [
|
||||
"# Polygon IO\n",
|
||||
"# Polygon Stock Market API Tools\n",
|
||||
"\n",
|
||||
">[Polygon](https://polygon.io/) The Polygon.io Stocks API provides REST endpoints that let you query the latest market data from all US stock exchanges.\n",
|
||||
"\n",
|
||||
@@ -25,7 +25,7 @@
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
|
||||
File diff suppressed because one or more lines are too long
441
docs/docs/integrations/tools/search_tools.ipynb
Normal file
441
docs/docs/integrations/tools/search_tools.ipynb
Normal file
@@ -0,0 +1,441 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6510f51c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Search Tools\n",
|
||||
"\n",
|
||||
"This notebook shows off usage of various search tools."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "e6860c2d",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "dadbcfcd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ee251155",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Google Serper API Wrapper\n",
|
||||
"\n",
|
||||
"First, let's try to use the Google Serper API tool."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "0cdaa487",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = load_tools([\"google-serper\"], llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "01b1ab4a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "5cf44ec0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I should look up the current weather conditions.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"weather in Pomfret\"\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m37°F\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the current temperature in Pomfret.\n",
|
||||
"Final Answer: The current temperature in Pomfret is 37°F.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The current temperature in Pomfret is 37°F.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the weather in Pomfret?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8786bdc8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## SearchApi\n",
|
||||
"\n",
|
||||
"Second, let's try SearchApi tool."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "5fd5ca32",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = load_tools([\"searchapi\"], llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "547c9cf5-aa4d-48ed-b7a5-29ecc1491adf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a7564c40-83ec-490b-ad36-385be5c20e58",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out the current weather in Pomfret.\n",
|
||||
"Action: searchapi\n",
|
||||
"Action Input: \"weather in Pomfret\"\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mThu 14 | Day ... Some clouds this morning will give way to generally sunny skies for the afternoon. High 73F. Winds NW at 5 to 10 mph.\n",
|
||||
"Hourly Weather-Pomfret, CT · 1 pm. 71°. 0%. Sunny. Feels Like71°. WindNW 9 mph · 2 pm. 72°. 0%. Sunny. Feels Like72°. WindNW 9 mph · 3 pm. 72°. 0%. Sunny. Feels ...\n",
|
||||
"10 Day Weather-Pomfret, VT. As of 4:28 am EDT. Today. 68°/48°. 4%. Thu 14 | Day. 68°. 4%. WNW 10 mph. Some clouds this morning will give way to generally ...\n",
|
||||
"Be prepared with the most accurate 10-day forecast for Pomfret, MD with highs, lows, chance of precipitation from The Weather Channel and Weather.com.\n",
|
||||
"Current Weather. 10:00 PM. 65°F. RealFeel® 67°. Mostly cloudy. LOCAL HURRICANE TRACKER. Category2. Lee. Late Friday Night - Saturday Afternoon.\n",
|
||||
"10 Day Weather-Pomfret, NY. As of 5:09 pm EDT. Tonight. --/55°. 10%. Wed 13 | Night. 55°. 10%. NW 11 mph. Some clouds. Low near 55F.\n",
|
||||
"Pomfret CT. Overnight. Overnight: Patchy fog before 3am, then patchy fog after 4am. Otherwise, mostly. Patchy Fog. Low: 58 °F. Thursday.\n",
|
||||
"Isolated showers. Mostly cloudy, with a high near 76. Calm wind. Chance of precipitation is 20%. Tonight. Mostly Cloudy. Mostly cloudy, with a ...\n",
|
||||
"Partly sunny, with a high near 67. Breezy, with a north wind 18 to 22 mph, with gusts as high as 34 mph. Chance of precipitation is 30%. ... A chance of showers ...\n",
|
||||
"Today's Weather - Pomfret, CT ... Patchy fog. Showers. Lows in the upper 50s. Northwest winds around 5 mph. Chance of rain near 100 percent. ... Sunny. Patchy fog ...\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The current weather in Pomfret is mostly cloudy with a high near 67 and a chance of showers. Winds are from the north at 18 to 22 mph with gusts up to 34 mph.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The current weather in Pomfret is mostly cloudy with a high near 67 and a chance of showers. Winds are from the north at 18 to 22 mph with gusts up to 34 mph.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the weather in Pomfret?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0e39fc46",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## SerpAPI\n",
|
||||
"\n",
|
||||
"Now, let's use the SerpAPI tool."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "e1c39a0f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = load_tools([\"serpapi\"], llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "900dd6cb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "342ee8ec",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out what the current weather is in Pomfret.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"weather in Pomfret\"\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m{'type': 'weather_result', 'temperature': '69', 'unit': 'Fahrenheit', 'precipitation': '2%', 'humidity': '90%', 'wind': '1 mph', 'location': 'Pomfret, CT', 'date': 'Sunday 9:00 PM', 'weather': 'Clear'}\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the current weather in Pomfret.\n",
|
||||
"Final Answer: The current weather in Pomfret is 69 degrees Fahrenheit, 2% precipitation, 90% humidity, and 1 mph wind. It is currently clear.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The current weather in Pomfret is 69 degrees Fahrenheit, 2% precipitation, 90% humidity, and 1 mph wind. It is currently clear.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the weather in Pomfret?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "adc8bb68",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## GoogleSearchAPIWrapper\n",
|
||||
"\n",
|
||||
"Now, let's use the official Google Search API Wrapper."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "ef24f92d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = load_tools([\"google-search\"], llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "909cd28b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "46515d2a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I should look up the current weather conditions.\n",
|
||||
"Action: Google Search\n",
|
||||
"Action Input: \"weather in Pomfret\"\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mShowers early becoming a steady light rain later in the day. Near record high temperatures. High around 60F. Winds SW at 10 to 15 mph. Chance of rain 60%. Pomfret, CT Weather Forecast, with current conditions, wind, air quality, and what to expect for the next 3 days. Hourly Weather-Pomfret, CT. As of 12:52 am EST. Special Weather Statement +2 ... Hazardous Weather Conditions. Special Weather Statement ... Pomfret CT. Tonight ... National Digital Forecast Database Maximum Temperature Forecast. Pomfret Center Weather Forecasts. Weather Underground provides local & long-range weather forecasts, weatherreports, maps & tropical weather conditions for ... Pomfret, CT 12 hour by hour weather forecast includes precipitation, temperatures, sky conditions, rain chance, dew-point, relative humidity, wind direction ... North Pomfret Weather Forecasts. Weather Underground provides local & long-range weather forecasts, weatherreports, maps & tropical weather conditions for ... Today's Weather - Pomfret, CT. Dec 31, 2022 4:00 PM. Putnam MS. --. Weather forecast icon. Feels like --. Hi --. Lo --. Pomfret, CT temperature trend for the next 14 Days. Find daytime highs and nighttime lows from TheWeatherNetwork.com. Pomfret, MD Weather Forecast Date: 332 PM EST Wed Dec 28 2022. The area/counties/county of: Charles, including the cites of: St. Charles and Waldorf.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the current weather conditions in Pomfret.\n",
|
||||
"Final Answer: Showers early becoming a steady light rain later in the day. Near record high temperatures. High around 60F. Winds SW at 10 to 15 mph. Chance of rain 60%.\u001b[0m\n",
|
||||
"\u001b[1m> Finished AgentExecutor chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Showers early becoming a steady light rain later in the day. Near record high temperatures. High around 60F. Winds SW at 10 to 15 mph. Chance of rain 60%.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the weather in Pomfret?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eabad3af",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## SearxNG Meta Search Engine\n",
|
||||
"\n",
|
||||
"Here we will be using a self hosted SearxNG meta search engine."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "b196c704",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = load_tools([\"searx-search\"], searx_host=\"http://localhost:8888\", llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "9023eeaa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "3aad92c1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I should look up the current weather\n",
|
||||
"Action: SearX Search\n",
|
||||
"Action Input: \"weather in Pomfret\"\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mMainly cloudy with snow showers around in the morning. High around 40F. Winds NNW at 5 to 10 mph. Chance of snow 40%. Snow accumulations less than one inch.\n",
|
||||
"\n",
|
||||
"10 Day Weather - Pomfret, MD As of 1:37 pm EST Today 49°/ 41° 52% Mon 27 | Day 49° 52% SE 14 mph Cloudy with occasional rain showers. High 49F. Winds SE at 10 to 20 mph. Chance of rain 50%....\n",
|
||||
"\n",
|
||||
"10 Day Weather - Pomfret, VT As of 3:51 am EST Special Weather Statement Today 39°/ 32° 37% Wed 01 | Day 39° 37% NE 4 mph Cloudy with snow showers developing for the afternoon. High 39F....\n",
|
||||
"\n",
|
||||
"Pomfret, CT ; Current Weather. 1:06 AM. 35°F · RealFeel® 32° ; TODAY'S WEATHER FORECAST. 3/3. 44°Hi. RealFeel® 50° ; TONIGHT'S WEATHER FORECAST. 3/3. 32°Lo.\n",
|
||||
"\n",
|
||||
"Pomfret, MD Forecast Today Hourly Daily Morning 41° 1% Afternoon 43° 0% Evening 35° 3% Overnight 34° 2% Don't Miss Finally, Here’s Why We Get More Colds and Flu When It’s Cold Coast-To-Coast...\n",
|
||||
"\n",
|
||||
"Pomfret, MD Weather Forecast | AccuWeather Current Weather 5:35 PM 35° F RealFeel® 36° RealFeel Shade™ 36° Air Quality Excellent Wind E 3 mph Wind Gusts 5 mph Cloudy More Details WinterCast...\n",
|
||||
"\n",
|
||||
"Pomfret, VT Weather Forecast | AccuWeather Current Weather 11:21 AM 23° F RealFeel® 27° RealFeel Shade™ 25° Air Quality Fair Wind ESE 3 mph Wind Gusts 7 mph Cloudy More Details WinterCast...\n",
|
||||
"\n",
|
||||
"Pomfret Center, CT Weather Forecast | AccuWeather Daily Current Weather 6:50 PM 39° F RealFeel® 36° Air Quality Fair Wind NW 6 mph Wind Gusts 16 mph Mostly clear More Details WinterCast...\n",
|
||||
"\n",
|
||||
"12:00 pm · Feels Like36° · WindN 5 mph · Humidity43% · UV Index3 of 10 · Cloud Cover65% · Rain Amount0 in ...\n",
|
||||
"\n",
|
||||
"Pomfret Center, CT Weather Conditions | Weather Underground star Popular Cities San Francisco, CA 49 °F Clear Manhattan, NY 37 °F Fair Schiller Park, IL (60176) warning39 °F Mostly Cloudy...\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The current weather in Pomfret is mainly cloudy with snow showers around in the morning. The temperature is around 40F with winds NNW at 5 to 10 mph. Chance of snow is 40%.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The current weather in Pomfret is mainly cloudy with snow showers around in the morning. The temperature is around 40F with winds NNW at 5 to 10 mph. Chance of snow is 40%.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the weather in Pomfret\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "b1677b440931f40d89ef8be7bf03acb108ce003de0ac9b18e8d43753ea2e7103"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -2,591 +2,390 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0e499e90-7a6d-4fab-8aab-31a4df417601",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# SQLDatabase Toolkit\n",
|
||||
"# SQL Database\n",
|
||||
"\n",
|
||||
"This will help you getting started with the SQL Database [toolkit](/docs/concepts/#toolkits). For detailed documentation of all `SQLDatabaseToolkit` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.html).\n",
|
||||
":::note\n",
|
||||
"The `SQLDatabase` adapter utility is a wrapper around a database connection.\n",
|
||||
"\n",
|
||||
"Tools within the `SQLDatabaseToolkit` are designed to interact with a `SQL` database. \n",
|
||||
"For talking to SQL databases, it uses the [SQLAlchemy] Core API .\n",
|
||||
":::\n",
|
||||
"\n",
|
||||
"A common application is to enable agents to answer questions using data in a relational database, potentially in an iterative fashion (e.g., recovering from errors).\n",
|
||||
"\n",
|
||||
"**⚠️ Security note ⚠️**\n",
|
||||
"This notebook shows how to use the utility to access an SQLite database.\n",
|
||||
"It uses the example [Chinook Database], and demonstrates those features:\n",
|
||||
"\n",
|
||||
"Building Q&A systems of SQL databases requires executing model-generated SQL queries. There are inherent risks in doing this. Make sure that your database connection permissions are always scoped as narrowly as possible for your chain/agent's needs. This will mitigate though not eliminate the risks of building a model-driven system. For more on general security best practices, [see here](/docs/security).\n",
|
||||
"- Query using SQL\n",
|
||||
"- Query using SQLAlchemy selectable\n",
|
||||
"- Fetch modes `cursor`, `all`, and `one`\n",
|
||||
"- Bind query parameters\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"[Chinook Database]: https://github.com/lerocha/chinook-database\n",
|
||||
"[SQLAlchemy]: https://www.sqlalchemy.org/\n",
|
||||
"\n",
|
||||
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
"\n",
|
||||
"You can use the `Tool` or `@tool` decorator to create a tool from this utility.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"::: {.callout-caution}\n",
|
||||
"If creating a tool from the SQLDatbase utility and combining it with an LLM or exposing it to an end user\n",
|
||||
"remember to follow good security practices.\n",
|
||||
"\n",
|
||||
"See security information: https://python.langchain.com/docs/security\n",
|
||||
":::"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3de6e3be-1fd9-42a3-8564-8ca7dca11e1c",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"jupyter": {
|
||||
"outputs_hidden": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "31896b61-68d2-4b4d-be9d-b829eda327d1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"This toolkit lives in the `langchain-community` package:"
|
||||
"!wget 'https://github.com/lerocha/chinook-database/releases/download/v1.4.2/Chinook_Sqlite.sql'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c4933e04-9120-4ccc-9ef7-369987823b0e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"execution_count": 22,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"1|AC/DC\r\n",
|
||||
"2|Accept\r\n",
|
||||
"3|Aerosmith\r\n",
|
||||
"4|Alanis Morissette\r\n",
|
||||
"5|Alice In Chains\r\n",
|
||||
"6|Antônio Carlos Jobim\r\n",
|
||||
"7|Apocalyptica\r\n",
|
||||
"8|Audioslave\r\n",
|
||||
"9|BackBeat\r\n",
|
||||
"10|Billy Cobham\r\n",
|
||||
"11|Black Label Society\r\n",
|
||||
"12|Black Sabbath\r\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchain-community"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6ad08dbe-1642-448c-b58d-153810024375",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For demonstration purposes, we will access a prompt in the LangChain [Hub](https://smith.langchain.com/hub). We will also require `langgraph` to demonstrate the use of the toolkit with an agent. This is not required to use the toolkit."
|
||||
"!sqlite3 -bail -cmd '.read Chinook_Sqlite.sql' -cmd 'SELECT * FROM Artist LIMIT 12;' -cmd '.quit'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f3dead45-9908-497d-a5a3-bce30642e88f",
|
||||
"metadata": {},
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchainhub langgraph"
|
||||
"!sqlite3 -bail -cmd '.read Chinook_Sqlite.sql' -cmd '.save Chinook.db' -cmd '.quit'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "804533b1-2f16-497b-821b-c82d67fcf7b6",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"The `SQLDatabaseToolkit` toolkit requires:\n",
|
||||
"\n",
|
||||
"- a [SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html) object;\n",
|
||||
"- a LLM or chat model (for instantiating the [QuerySQLCheckerTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.QuerySQLCheckerTool.html) tool).\n",
|
||||
"\n",
|
||||
"Below, we instantiate the toolkit with these objects. Let's first create a database object.\n",
|
||||
"\n",
|
||||
"This guide uses the example `Chinook` database based on [these instructions](https://database.guide/2-sample-databases-sqlite/).\n",
|
||||
"\n",
|
||||
"Below we will use the `requests` library to pull the `.sql` file and create an in-memory SQLite database. Note that this approach is lightweight, but ephemeral and not thread-safe. If you'd prefer, you can follow the instructions to save the file locally as `Chinook.db` and instantiate the database via `db = SQLDatabase.from_uri(\"sqlite:///Chinook.db\")`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "40d05f9b-5a8f-4307-8f8b-4153db0fdfa9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sqlite3\n",
|
||||
"\n",
|
||||
"import requests\n",
|
||||
"from langchain_community.utilities.sql_database import SQLDatabase\n",
|
||||
"from sqlalchemy import create_engine\n",
|
||||
"from sqlalchemy.pool import StaticPool\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_engine_for_chinook_db():\n",
|
||||
" \"\"\"Pull sql file, populate in-memory database, and create engine.\"\"\"\n",
|
||||
" url = \"https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql\"\n",
|
||||
" response = requests.get(url)\n",
|
||||
" sql_script = response.text\n",
|
||||
"\n",
|
||||
" connection = sqlite3.connect(\":memory:\", check_same_thread=False)\n",
|
||||
" connection.executescript(sql_script)\n",
|
||||
" return create_engine(\n",
|
||||
" \"sqlite://\",\n",
|
||||
" creator=lambda: connection,\n",
|
||||
" poolclass=StaticPool,\n",
|
||||
" connect_args={\"check_same_thread\": False},\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"engine = get_engine_for_chinook_db()\n",
|
||||
"\n",
|
||||
"db = SQLDatabase(engine)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b9a6326-78fd-4c42-a1cb-4316619ac449",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will also need a LLM or chat model:\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
|
||||
"\n",
|
||||
"<ChatModelTabs customVarName=\"llm\" />\n",
|
||||
"```"
|
||||
"## Initialize Database"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "cc6e6108-83d9-404f-8f31-474c2fbf5f6c",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from pprint import pprint\n",
|
||||
"\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"import sqlalchemy as sa\n",
|
||||
"from langchain_community.utilities import SQLDatabase\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0)"
|
||||
"db = SQLDatabase.from_uri(\"sqlite:///Chinook.db\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "77925e72-4730-43c3-8726-d68cedf635f4",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"We can now instantiate the toolkit:"
|
||||
"## Query as cursor\n",
|
||||
"\n",
|
||||
"The fetch mode `cursor` returns results as SQLAlchemy's\n",
|
||||
"`CursorResult` instance."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "42bd5a41-672a-4a53-b70a-2f0c0555758c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"execution_count": 30,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<class 'sqlalchemy.engine.cursor.CursorResult'>\n",
|
||||
"[{'ArtistId': 1, 'Name': 'AC/DC'},\n",
|
||||
" {'ArtistId': 2, 'Name': 'Accept'},\n",
|
||||
" {'ArtistId': 3, 'Name': 'Aerosmith'},\n",
|
||||
" {'ArtistId': 4, 'Name': 'Alanis Morissette'},\n",
|
||||
" {'ArtistId': 5, 'Name': 'Alice In Chains'},\n",
|
||||
" {'ArtistId': 6, 'Name': 'Antônio Carlos Jobim'},\n",
|
||||
" {'ArtistId': 7, 'Name': 'Apocalyptica'},\n",
|
||||
" {'ArtistId': 8, 'Name': 'Audioslave'},\n",
|
||||
" {'ArtistId': 9, 'Name': 'BackBeat'},\n",
|
||||
" {'ArtistId': 10, 'Name': 'Billy Cobham'},\n",
|
||||
" {'ArtistId': 11, 'Name': 'Black Label Society'},\n",
|
||||
" {'ArtistId': 12, 'Name': 'Black Sabbath'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit\n",
|
||||
"\n",
|
||||
"toolkit = SQLDatabaseToolkit(db=db, llm=llm)"
|
||||
"result = db.run(\"SELECT * FROM Artist LIMIT 12;\", fetch=\"cursor\")\n",
|
||||
"print(type(result))\n",
|
||||
"pprint(list(result.mappings()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b2f882cf-4156-4a9f-a714-db97ec8ccc37",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"## Query as string payload\n",
|
||||
"\n",
|
||||
"View available tools:"
|
||||
"The fetch modes `all` and `one` return results in string format."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<class 'str'>\n",
|
||||
"[(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains'), (6, 'Antônio Carlos Jobim'), (7, 'Apocalyptica'), (8, 'Audioslave'), (9, 'BackBeat'), (10, 'Billy Cobham'), (11, 'Black Label Society'), (12, 'Black Sabbath')]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = db.run(\"SELECT * FROM Artist LIMIT 12;\", fetch=\"all\")\n",
|
||||
"print(type(result))\n",
|
||||
"print(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<class 'str'>\n",
|
||||
"[(1, 'AC/DC')]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = db.run(\"SELECT * FROM Artist LIMIT 12;\", fetch=\"one\")\n",
|
||||
"print(type(result))\n",
|
||||
"print(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Query with parameters\n",
|
||||
"\n",
|
||||
"In order to bind query parameters, use the optional `parameters` argument."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 41,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[{'ArtistId': 35, 'Name': 'Pedro Luís & A Parede'},\n",
|
||||
" {'ArtistId': 115, 'Name': 'Page & Plant'},\n",
|
||||
" {'ArtistId': 116, 'Name': 'Passengers'},\n",
|
||||
" {'ArtistId': 117, 'Name': \"Paul D'Ianno\"},\n",
|
||||
" {'ArtistId': 118, 'Name': 'Pearl Jam'},\n",
|
||||
" {'ArtistId': 119, 'Name': 'Peter Tosh'},\n",
|
||||
" {'ArtistId': 120, 'Name': 'Pink Floyd'},\n",
|
||||
" {'ArtistId': 121, 'Name': 'Planet Hemp'},\n",
|
||||
" {'ArtistId': 186, 'Name': 'Pedro Luís E A Parede'},\n",
|
||||
" {'ArtistId': 256, 'Name': 'Philharmonia Orchestra & Sir Neville Marriner'},\n",
|
||||
" {'ArtistId': 275, 'Name': 'Philip Glass Ensemble'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = db.run(\n",
|
||||
" \"SELECT * FROM Artist WHERE Name LIKE :search;\",\n",
|
||||
" parameters={\"search\": \"p%\"},\n",
|
||||
" fetch=\"cursor\",\n",
|
||||
")\n",
|
||||
"pprint(list(result.mappings()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Query with SQLAlchemy selectable\n",
|
||||
"\n",
|
||||
"Other than plain-text SQL statements, the adapter also accepts SQLAlchemy selectables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "a18c3e69-bee0-4f5d-813e-eeb540f41b98",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[QuerySQLDataBaseTool(description=\"Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.\", db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x105e02860>),\n",
|
||||
" InfoSQLDatabaseTool(description='Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x105e02860>),\n",
|
||||
" ListSQLDatabaseTool(db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x105e02860>),\n",
|
||||
" QuerySQLCheckerTool(description='Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x105e02860>, llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x1148a97b0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x1148aaec0>, temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy=''), llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['dialect', 'query'], template='\\n{query}\\nDouble check the {dialect} query above for common mistakes, including:\\n- Using NOT IN with NULL values\\n- Using UNION when UNION ALL should have been used\\n- Using BETWEEN for exclusive ranges\\n- Data type mismatch in predicates\\n- Properly quoting identifiers\\n- Using the correct number of arguments for functions\\n- Casting to the correct data type\\n- Using the proper columns for joins\\n\\nIf there are any of the above mistakes, rewrite the query. If there are no mistakes, just reproduce the original query.\\n\\nOutput the final SQL query only.\\n\\nSQL Query: '), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x1148a97b0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x1148aaec0>, temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy='')))]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"toolkit.get_tools()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5f5751e3-2e98-485f-8164-db8094039c25",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"API references:\n",
|
||||
"\n",
|
||||
"- [QuerySQLDataBaseTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.QuerySQLDataBaseTool.html)\n",
|
||||
"- [InfoSQLDatabaseTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.InfoSQLDatabaseTool.html)\n",
|
||||
"- [ListSQLDatabaseTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.ListSQLDatabaseTool.html)\n",
|
||||
"- [QuerySQLCheckerTool](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.sql_database.tool.QuerySQLCheckerTool.html)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c067e0ed-dcca-4dcc-81b2-a0eeb4fc2a9f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use within an agent\n",
|
||||
"\n",
|
||||
"Following the [SQL Q&A Tutorial](/docs/tutorials/sql_qa/#agents), below we equip a simple question-answering agent with the tools in our toolkit. First we pull a relevant prompt and populate it with its required parameters:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "eda12f8b-be90-4697-ac84-2ece9e2d1708",
|
||||
"metadata": {},
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"['dialect', 'top_k']\n"
|
||||
"[{'ArtistId': 35, 'Name': 'Pedro Luís & A Parede'},\n",
|
||||
" {'ArtistId': 115, 'Name': 'Page & Plant'},\n",
|
||||
" {'ArtistId': 116, 'Name': 'Passengers'},\n",
|
||||
" {'ArtistId': 117, 'Name': \"Paul D'Ianno\"},\n",
|
||||
" {'ArtistId': 118, 'Name': 'Pearl Jam'},\n",
|
||||
" {'ArtistId': 119, 'Name': 'Peter Tosh'},\n",
|
||||
" {'ArtistId': 120, 'Name': 'Pink Floyd'},\n",
|
||||
" {'ArtistId': 121, 'Name': 'Planet Hemp'},\n",
|
||||
" {'ArtistId': 186, 'Name': 'Pedro Luís E A Parede'},\n",
|
||||
" {'ArtistId': 256, 'Name': 'Philharmonia Orchestra & Sir Neville Marriner'},\n",
|
||||
" {'ArtistId': 275, 'Name': 'Philip Glass Ensemble'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain import hub\n",
|
||||
"# In order to build a selectable on SA's Core API, you need a table definition.\n",
|
||||
"metadata = sa.MetaData()\n",
|
||||
"artist = sa.Table(\n",
|
||||
" \"Artist\",\n",
|
||||
" metadata,\n",
|
||||
" sa.Column(\"ArtistId\", sa.INTEGER, primary_key=True),\n",
|
||||
" sa.Column(\"Name\", sa.TEXT),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"prompt_template = hub.pull(\"langchain-ai/sql-agent-system-prompt\")\n",
|
||||
"# Build a selectable with the same semantics of the recent query.\n",
|
||||
"query = sa.select(artist).where(artist.c.Name.like(\"p%\"))\n",
|
||||
"result = db.run(query, fetch=\"cursor\")\n",
|
||||
"pprint(list(result.mappings()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Query with execution options\n",
|
||||
"\n",
|
||||
"assert len(prompt_template.messages) == 1\n",
|
||||
"print(prompt_template.input_variables)"
|
||||
"It is possible to augment the statement invocation with custom execution options.\n",
|
||||
"For example, when applying a schema name translation, subsequent statements will\n",
|
||||
"fail, because they try to hit a non-existing table."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "3470ae96-e5e5-4717-a6d6-d7d28c7b7347",
|
||||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"system_message = prompt_template.format(dialect=\"SQLite\", top_k=5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "97930c07-36d1-4137-94ae-fe5ac83ecc44",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We then instantiate the agent:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "48bca92c-9b4b-4d5c-bcce-1b239c9e901c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"agent_executor = create_react_agent(\n",
|
||||
" llm, toolkit.get_tools(), state_modifier=system_message\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "09fb1845-1105-4f41-98b4-24756452a3e3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And issue it a query:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "39e6d2bf-3194-4aba-854b-63faf919157b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"Which country's customers spent the most?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_list_tables (call_eiheSxiL0s90KE50XyBnBtJY)\n",
|
||||
" Call ID: call_eiheSxiL0s90KE50XyBnBtJY\n",
|
||||
" Args:\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_list_tables\n",
|
||||
"\n",
|
||||
"Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_schema (call_YKwGWt4UUVmxxY7vjjBDzFLJ)\n",
|
||||
" Call ID: call_YKwGWt4UUVmxxY7vjjBDzFLJ\n",
|
||||
" Args:\n",
|
||||
" table_names: Customer, Invoice, InvoiceLine\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_schema\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"Customer\" (\n",
|
||||
"\t\"CustomerId\" INTEGER NOT NULL, \n",
|
||||
"\t\"FirstName\" NVARCHAR(40) NOT NULL, \n",
|
||||
"\t\"LastName\" NVARCHAR(20) NOT NULL, \n",
|
||||
"\t\"Company\" NVARCHAR(80), \n",
|
||||
"\t\"Address\" NVARCHAR(70), \n",
|
||||
"\t\"City\" NVARCHAR(40), \n",
|
||||
"\t\"State\" NVARCHAR(40), \n",
|
||||
"\t\"Country\" NVARCHAR(40), \n",
|
||||
"\t\"PostalCode\" NVARCHAR(10), \n",
|
||||
"\t\"Phone\" NVARCHAR(24), \n",
|
||||
"\t\"Fax\" NVARCHAR(24), \n",
|
||||
"\t\"Email\" NVARCHAR(60) NOT NULL, \n",
|
||||
"\t\"SupportRepId\" INTEGER, \n",
|
||||
"\tPRIMARY KEY (\"CustomerId\"), \n",
|
||||
"\tFOREIGN KEY(\"SupportRepId\") REFERENCES \"Employee\" (\"EmployeeId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from Customer table:\n",
|
||||
"CustomerId\tFirstName\tLastName\tCompany\tAddress\tCity\tState\tCountry\tPostalCode\tPhone\tFax\tEmail\tSupportRepId\n",
|
||||
"1\tLuís\tGonçalves\tEmbraer - Empresa Brasileira de Aeronáutica S.A.\tAv. Brigadeiro Faria Lima, 2170\tSão José dos Campos\tSP\tBrazil\t12227-000\t+55 (12) 3923-5555\t+55 (12) 3923-5566\tluisg@embraer.com.br\t3\n",
|
||||
"2\tLeonie\tKöhler\tNone\tTheodor-Heuss-Straße 34\tStuttgart\tNone\tGermany\t70174\t+49 0711 2842222\tNone\tleonekohler@surfeu.de\t5\n",
|
||||
"3\tFrançois\tTremblay\tNone\t1498 rue Bélanger\tMontréal\tQC\tCanada\tH2G 1A7\t+1 (514) 721-4711\tNone\tftremblay@gmail.com\t3\n",
|
||||
"*/\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"Invoice\" (\n",
|
||||
"\t\"InvoiceId\" INTEGER NOT NULL, \n",
|
||||
"\t\"CustomerId\" INTEGER NOT NULL, \n",
|
||||
"\t\"InvoiceDate\" DATETIME NOT NULL, \n",
|
||||
"\t\"BillingAddress\" NVARCHAR(70), \n",
|
||||
"\t\"BillingCity\" NVARCHAR(40), \n",
|
||||
"\t\"BillingState\" NVARCHAR(40), \n",
|
||||
"\t\"BillingCountry\" NVARCHAR(40), \n",
|
||||
"\t\"BillingPostalCode\" NVARCHAR(10), \n",
|
||||
"\t\"Total\" NUMERIC(10, 2) NOT NULL, \n",
|
||||
"\tPRIMARY KEY (\"InvoiceId\"), \n",
|
||||
"\tFOREIGN KEY(\"CustomerId\") REFERENCES \"Customer\" (\"CustomerId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from Invoice table:\n",
|
||||
"InvoiceId\tCustomerId\tInvoiceDate\tBillingAddress\tBillingCity\tBillingState\tBillingCountry\tBillingPostalCode\tTotal\n",
|
||||
"1\t2\t2021-01-01 00:00:00\tTheodor-Heuss-Straße 34\tStuttgart\tNone\tGermany\t70174\t1.98\n",
|
||||
"2\t4\t2021-01-02 00:00:00\tUllevålsveien 14\tOslo\tNone\tNorway\t0171\t3.96\n",
|
||||
"3\t8\t2021-01-03 00:00:00\tGrétrystraat 63\tBrussels\tNone\tBelgium\t1000\t5.94\n",
|
||||
"*/\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"InvoiceLine\" (\n",
|
||||
"\t\"InvoiceLineId\" INTEGER NOT NULL, \n",
|
||||
"\t\"InvoiceId\" INTEGER NOT NULL, \n",
|
||||
"\t\"TrackId\" INTEGER NOT NULL, \n",
|
||||
"\t\"UnitPrice\" NUMERIC(10, 2) NOT NULL, \n",
|
||||
"\t\"Quantity\" INTEGER NOT NULL, \n",
|
||||
"\tPRIMARY KEY (\"InvoiceLineId\"), \n",
|
||||
"\tFOREIGN KEY(\"TrackId\") REFERENCES \"Track\" (\"TrackId\"), \n",
|
||||
"\tFOREIGN KEY(\"InvoiceId\") REFERENCES \"Invoice\" (\"InvoiceId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from InvoiceLine table:\n",
|
||||
"InvoiceLineId\tInvoiceId\tTrackId\tUnitPrice\tQuantity\n",
|
||||
"1\t1\t2\t0.99\t1\n",
|
||||
"2\t1\t4\t0.99\t1\n",
|
||||
"3\t2\t6\t0.99\t1\n",
|
||||
"*/\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_query (call_7WBDcMxl1h7MnI05njx1q8V9)\n",
|
||||
" Call ID: call_7WBDcMxl1h7MnI05njx1q8V9\n",
|
||||
" Args:\n",
|
||||
" query: SELECT c.Country, SUM(i.Total) AS TotalSpent FROM Customer c JOIN Invoice i ON c.CustomerId = i.CustomerId GROUP BY c.Country ORDER BY TotalSpent DESC LIMIT 1\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_query\n",
|
||||
"\n",
|
||||
"[('USA', 523.0600000000003)]\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\n",
|
||||
"Customers from the USA spent the most, with a total amount spent of $523.06.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"example_query = \"Which country's customers spent the most?\"\n",
|
||||
"\n",
|
||||
"events = agent_executor.stream(\n",
|
||||
" {\"messages\": [(\"user\", example_query)]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
")\n",
|
||||
"for event in events:\n",
|
||||
" event[\"messages\"][-1].pretty_print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "adbf3d8d-7570-45a5-950f-ce84db5145ab",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also observe the agent recover from an error:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "23c1235c-6d18-43e4-98ab-85b426b53d94",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"Who are the top 3 best selling artists?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_query (call_9F6Bp2vwsDkeLW6FsJFqLiet)\n",
|
||||
" Call ID: call_9F6Bp2vwsDkeLW6FsJFqLiet\n",
|
||||
" Args:\n",
|
||||
" query: SELECT artist_name, SUM(quantity) AS total_sold FROM sales GROUP BY artist_name ORDER BY total_sold DESC LIMIT 3\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_query\n",
|
||||
"\n",
|
||||
"Error: (sqlite3.OperationalError) no such table: sales\n",
|
||||
"[SQL: SELECT artist_name, SUM(quantity) AS total_sold FROM sales GROUP BY artist_name ORDER BY total_sold DESC LIMIT 3]\n",
|
||||
"(Background on this error at: https://sqlalche.me/e/20/e3q8)\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_list_tables (call_Gx5adzWnrBDIIxzUDzsn83zO)\n",
|
||||
" Call ID: call_Gx5adzWnrBDIIxzUDzsn83zO\n",
|
||||
" Args:\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_list_tables\n",
|
||||
"\n",
|
||||
"Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_schema (call_ftywrZgEgGWLrnk9dYC0xtZv)\n",
|
||||
" Call ID: call_ftywrZgEgGWLrnk9dYC0xtZv\n",
|
||||
" Args:\n",
|
||||
" table_names: Artist, Album, InvoiceLine\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_schema\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"Album\" (\n",
|
||||
"\t\"AlbumId\" INTEGER NOT NULL, \n",
|
||||
"\t\"Title\" NVARCHAR(160) NOT NULL, \n",
|
||||
"\t\"ArtistId\" INTEGER NOT NULL, \n",
|
||||
"\tPRIMARY KEY (\"AlbumId\"), \n",
|
||||
"\tFOREIGN KEY(\"ArtistId\") REFERENCES \"Artist\" (\"ArtistId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from Album table:\n",
|
||||
"AlbumId\tTitle\tArtistId\n",
|
||||
"1\tFor Those About To Rock We Salute You\t1\n",
|
||||
"2\tBalls to the Wall\t2\n",
|
||||
"3\tRestless and Wild\t2\n",
|
||||
"*/\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"Artist\" (\n",
|
||||
"\t\"ArtistId\" INTEGER NOT NULL, \n",
|
||||
"\t\"Name\" NVARCHAR(120), \n",
|
||||
"\tPRIMARY KEY (\"ArtistId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from Artist table:\n",
|
||||
"ArtistId\tName\n",
|
||||
"1\tAC/DC\n",
|
||||
"2\tAccept\n",
|
||||
"3\tAerosmith\n",
|
||||
"*/\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"CREATE TABLE \"InvoiceLine\" (\n",
|
||||
"\t\"InvoiceLineId\" INTEGER NOT NULL, \n",
|
||||
"\t\"InvoiceId\" INTEGER NOT NULL, \n",
|
||||
"\t\"TrackId\" INTEGER NOT NULL, \n",
|
||||
"\t\"UnitPrice\" NUMERIC(10, 2) NOT NULL, \n",
|
||||
"\t\"Quantity\" INTEGER NOT NULL, \n",
|
||||
"\tPRIMARY KEY (\"InvoiceLineId\"), \n",
|
||||
"\tFOREIGN KEY(\"TrackId\") REFERENCES \"Track\" (\"TrackId\"), \n",
|
||||
"\tFOREIGN KEY(\"InvoiceId\") REFERENCES \"Invoice\" (\"InvoiceId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from InvoiceLine table:\n",
|
||||
"InvoiceLineId\tInvoiceId\tTrackId\tUnitPrice\tQuantity\n",
|
||||
"1\t1\t2\t0.99\t1\n",
|
||||
"2\t1\t4\t0.99\t1\n",
|
||||
"3\t2\t6\t0.99\t1\n",
|
||||
"*/\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" sql_db_query (call_i6n3lmS7E2ZivN758VOayTiy)\n",
|
||||
" Call ID: call_i6n3lmS7E2ZivN758VOayTiy\n",
|
||||
" Args:\n",
|
||||
" query: SELECT Artist.Name AS artist_name, SUM(InvoiceLine.Quantity) AS total_sold FROM Artist JOIN Album ON Artist.ArtistId = Album.ArtistId JOIN Track ON Album.AlbumId = Track.AlbumId JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY total_sold DESC LIMIT 3\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: sql_db_query\n",
|
||||
"\n",
|
||||
"[('Iron Maiden', 140), ('U2', 107), ('Metallica', 91)]\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\n",
|
||||
"The top 3 best selling artists are:\n",
|
||||
"1. Iron Maiden - 140 units sold\n",
|
||||
"2. U2 - 107 units sold\n",
|
||||
"3. Metallica - 91 units sold\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"example_query = \"Who are the top 3 best selling artists?\"\n",
|
||||
"\n",
|
||||
"events = agent_executor.stream(\n",
|
||||
" {\"messages\": [(\"user\", example_query)]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
")\n",
|
||||
"for event in events:\n",
|
||||
" event[\"messages\"][-1].pretty_print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73521f1b-be03-44e6-8b27-a9a46ae8e962",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specific functionality\n",
|
||||
"\n",
|
||||
"`SQLDatabaseToolkit` implements a [.get_context](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.html#langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.get_context) method as a convenience for use in prompts or other contexts.\n",
|
||||
"\n",
|
||||
"**⚠️ Disclaimer ⚠️** : The agent may generate insert/update/delete queries. When this is not expected, use a custom prompt or create a SQL users without write permissions.\n",
|
||||
"\n",
|
||||
"The final user might overload your SQL database by asking a simple question such as \"run the biggest query possible\". The generated query might look like:\n",
|
||||
"\n",
|
||||
"```sql\n",
|
||||
"SELECT * FROM \"public\".\"users\"\n",
|
||||
" JOIN \"public\".\"user_permissions\" ON \"public\".\"users\".id = \"public\".\"user_permissions\".user_id\n",
|
||||
" JOIN \"public\".\"projects\" ON \"public\".\"users\".id = \"public\".\"projects\".user_id\n",
|
||||
" JOIN \"public\".\"events\" ON \"public\".\"projects\".id = \"public\".\"events\".project_id;\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"For a transactional SQL database, if one of the table above contains millions of rows, the query might cause trouble to other applications using the same database.\n",
|
||||
"\n",
|
||||
"Most datawarehouse oriented databases support user-level quota, for limiting resource usage."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1aa8a7e3-87ca-4963-a224-0cbdc9d88714",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all SQLDatabaseToolkit features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/agent_toolkits/langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.html)."
|
||||
"query = sa.select(artist).where(artist.c.Name.like(\"p%\"))\n",
|
||||
"db.run(query, fetch=\"cursor\", execution_options={\"schema_translate_map\": {None: \"bar\"}})"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -606,9 +405,9 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.4"
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
||||
@@ -93,7 +93,7 @@
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"Here we show how to instantiate an instance of the Tavily search tools, with "
|
||||
"Here we show how to instatiate an instance of the Tavily search tools, with "
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,2 +0,0 @@
|
||||
# files generated by faiss.ipynb
|
||||
faiss_index
|
||||
@@ -5,13 +5,33 @@
|
||||
"id": "66d0270a-b74f-4110-901e-7960b00297af",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Astra DB Vector Store\n",
|
||||
"# Astra DB\n",
|
||||
"\n",
|
||||
"This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) as a Vector Store.\n",
|
||||
"\n",
|
||||
"> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available through an easy-to-use JSON API.\n",
|
||||
"\n",
|
||||
"## Setup"
|
||||
"This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) as a Vector Store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ab8cd64f-3bb2-4f16-a0a9-12d7b1789bf6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available through an easy-to-use JSON API."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d2d6ca14-fb7e-4172-9aa0-a3119a064b96",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_Note: in addition to access to the database, an OpenAI API Key is required to run the full example._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bb9be7ce-8c70-4d46-9f11-71c42a36e928",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup and general dependencies"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -19,7 +39,7 @@
|
||||
"id": "dbe7c156-0413-47e3-9237-4769c4248869",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Use of the integration requires the `langchain-astradb` partner package:"
|
||||
"Use of the integration requires the corresponding Python package:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -29,61 +49,54 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install -qU \"langchain-astradb>=0.3.3\""
|
||||
"pip install -qU langchain-astradb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "319bf84b",
|
||||
"id": "2453d83a-bc8f-41e1-a692-befe4dd90156",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"In order to use the AstraDB vector store, you must first head to the [AstraDB website](https://astra.datastax.com), create an account, and then create a new database - the initialization might take a few minutes. \n",
|
||||
"\n",
|
||||
"Once the database has been initialized, you should [create an application token](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html#generate-application-token) and save it for later use. \n",
|
||||
"\n",
|
||||
"You will also want to copy the `API Endpoint` from the `Database Details` and store that in the `ASTRA_DB_API_ENDPOINT` variable.\n",
|
||||
"\n",
|
||||
"You may optionally provide a namespace, which you can manage from the `Data Explorer` tab of your database dashboard. If you don't wish to set a namespace, you can leave the `getpass` prompt for `ASTRA_DB_NAMESPACE` empty."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "b7843c22",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"ASTRA_DB_API_ENDPOINT = getpass.getpass(\"ASTRA_DB_API_ENDPOINT = \")\n",
|
||||
"ASTRA_DB_APPLICATION_TOKEN = getpass.getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
|
||||
"\n",
|
||||
"desired_namespace = getpass.getpass(\"ASTRA_DB_NAMESPACE = \")\n",
|
||||
"if desired_namespace:\n",
|
||||
" ASTRA_DB_NAMESPACE = desired_namespace\n",
|
||||
"else:\n",
|
||||
" ASTRA_DB_NAMESPACE = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e1c5cd9e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
"_Make sure you have installed the packages required to run all of this demo:_"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3cb739c0",
|
||||
"id": "56c1f86e-5921-4976-ac8f-1d62e5a512b0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
"pip install -qU langchain langchain-community langchain-openai datasets pypdf"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c2910035-e61f-48d9-a110-d68c401b62aa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Import dependencies"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b06619af-fea2-4863-8149-7f239a8c9c82",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"from astrapy.info import CollectionVectorServiceOptions\n",
|
||||
"from datasets import load_dataset\n",
|
||||
"from langchain_community.document_loaders import PyPDFLoader\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
|
||||
"from langchain_text_splitters import RecursiveCharacterTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -91,59 +104,48 @@
|
||||
"id": "22866f09-e10d-4f05-a24b-b9420129462e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"There are two ways to create an Astra DB vector store, which differ in how the embeddings are computed.\n",
|
||||
"\n",
|
||||
"#### Method 1: Explicit embeddings\n",
|
||||
"\n",
|
||||
"You can separately instantiate a `langchain_core.embeddings.Embeddings` class and pass it to the `AstraDBVectorStore` constructor, just like with most other LangChain vector stores.\n",
|
||||
"\n",
|
||||
"#### Method 2: Integrated embedding computation\n",
|
||||
"\n",
|
||||
"Alternatively, you can use the [Vectorize](https://www.datastax.com/blog/simplifying-vector-embedding-generation-with-astra-vectorize) feature of Astra DB and simply specify the name of a supported embedding model when creating the store. The embedding computations are entirely handled within the database. (To proceed with this method, you must have enabled the desired embedding integration for your database, as described [in the docs](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html).)\n",
|
||||
"\n",
|
||||
"### Explicit Embedding Initialization\n",
|
||||
"\n",
|
||||
"Below, we instantiate our vector store using the explicit embedding class:\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
"## Import the Vector Store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "d71a1dcb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"execution_count": null,
|
||||
"id": "0b32730d-176e-414c-9d91-fd3644c54211",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_astradb import AstraDBVectorStore\n",
|
||||
"from langchain_astradb import AstraDBVectorStore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68f61b01-3e09-47c1-9d67-5d6915c86626",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## DB Connection parameters\n",
|
||||
"\n",
|
||||
"vector_store = AstraDBVectorStore(\n",
|
||||
" collection_name=\"astra_vector_langchain\",\n",
|
||||
" embedding=embeddings,\n",
|
||||
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
|
||||
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
" namespace=ASTRA_DB_NAMESPACE,\n",
|
||||
")"
|
||||
"These are found on your Astra DB dashboard:\n",
|
||||
"\n",
|
||||
"- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
|
||||
"- the Token looks like `AstraCS:6gBhNmsk135....`\n",
|
||||
"- you may optionally provide a _Namespace_ such as `my_namespace`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d78af8ed-cff9-4f14-aa5d-016f99ab547c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n",
|
||||
"ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
|
||||
"\n",
|
||||
"desired_namespace = input(\"(optional) Namespace = \")\n",
|
||||
"if desired_namespace:\n",
|
||||
" ASTRA_DB_KEYSPACE = desired_namespace\n",
|
||||
"else:\n",
|
||||
" ASTRA_DB_KEYSPACE = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -151,14 +153,85 @@
|
||||
"id": "84a1fe85-a42c-4f15-92e1-f79f1dd43ea2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Integrated Embedding Initialization\n",
|
||||
"## Create the vector store\n",
|
||||
"\n",
|
||||
"There are two ways to create an Astra DB vector store, which differ in how the embeddings are computed.\n",
|
||||
"\n",
|
||||
"*Explicit embeddings*. You can separately instantiate a `langchain_core.embeddings.Embeddings` class and pass it to the `AstraDBVectorStore` constructor, just like with most other LangChain vector stores.\n",
|
||||
"\n",
|
||||
"*Integrated embedding computation*. Alternatively, you can use the [Vectorize](https://www.datastax.com/blog/simplifying-vector-embedding-generation-with-astra-vectorize) feature of Astra DB and simply specify the name of a supported embedding model when creating the store. The embedding computations are entirely handled within the database. (To proceed with this method, you must have enabled the desired embedding integration for your database, as described [in the docs](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html).)\n",
|
||||
"\n",
|
||||
"**Please choose one method and run the corresponding cells only.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8c435386-e8d5-41f4-a9e5-7b609ef781f9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Method 1: provide embeddings explicitly\n",
|
||||
"\n",
|
||||
"This demo will use an OpenAI embedding model:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dfa5c005-9738-4c53-b8a8-8540fcbb8bad",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass(\"OPENAI_API_KEY = \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3accae6f-73e2-483a-83f7-76eb33558a1f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"my_embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "465b1b16-5363-4c4f-9917-a49e02a86c14",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can create the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8b77553b-8bb5-4949-b87b-8c6abac56a26",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vstore = AstraDBVectorStore(\n",
|
||||
" embedding=my_embeddings,\n",
|
||||
" collection_name=\"astra_vector_demo\",\n",
|
||||
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
|
||||
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
" namespace=ASTRA_DB_KEYSPACE,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5d5d2bfa-c071-4a5b-8b6e-3daa1b6de164",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Method 2: use Astra Vectorize (embeddings integrated in Astra DB)\n",
|
||||
"\n",
|
||||
"Here it is assumed that you have\n",
|
||||
"\n",
|
||||
"- Enabled the OpenAI integration in your Astra DB organization,\n",
|
||||
"- Added an API Key named `\"OPENAI_API_KEY\"` to the integration, and scoped it to the database you are using.\n",
|
||||
"- enabled the OpenAI integration in your Astra DB organization,\n",
|
||||
"- added an API Key named `\"MY_OPENAI_API_KEY\"` to the integration, and\n",
|
||||
"- scoped it to the database you are using.\n",
|
||||
"\n",
|
||||
"For more details on how to do this, please consult the [documentation](https://docs.datastax.com/en/astra-db-serverless/integrations/embedding-providers/openai.html)."
|
||||
"For more details please consult the [documentation](https://docs.datastax.com/en/astra-db-serverless/integrations/embedding-providers/openai.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -168,204 +241,178 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from astrapy.info import CollectionVectorServiceOptions\n",
|
||||
"\n",
|
||||
"openai_vectorize_options = CollectionVectorServiceOptions(\n",
|
||||
" provider=\"openai\",\n",
|
||||
" model_name=\"text-embedding-3-small\",\n",
|
||||
" authentication={\n",
|
||||
" \"providerKey\": \"OPENAI_API_KEY\",\n",
|
||||
" \"providerKey\": \"MY_OPENAI_API_KEY\",\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"vector_store_integrated = AstraDBVectorStore(\n",
|
||||
" collection_name=\"astra_vector_langchain_integrated\",\n",
|
||||
"vstore = AstraDBVectorStore(\n",
|
||||
" collection_name=\"astra_vectorize_demo\",\n",
|
||||
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
|
||||
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
" namespace=ASTRA_DB_NAMESPACE,\n",
|
||||
" namespace=ASTRA_DB_KEYSPACE,\n",
|
||||
" collection_vector_service_options=openai_vectorize_options,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d3796b39",
|
||||
"id": "9a348678-b2f6-46ca-9a0d-2eb4cc6b66b1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"\n",
|
||||
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"We can add items to our vector store by using the `add_documents` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "afb3e155",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[UUID('89a5cea1-5f3d-47c1-89dc-7e36e12cf4de'),\n",
|
||||
" UUID('d4e78c48-f954-4612-8a38-af22923ba23b'),\n",
|
||||
" UUID('058e4046-ded0-4fc1-b8ac-60e5a5f08ea0'),\n",
|
||||
" UUID('50ab2a9a-762c-4b78-b102-942a86d77288'),\n",
|
||||
" UUID('1da5a3c1-ba51-4f2f-aaaf-79a8f5011ce3'),\n",
|
||||
" UUID('f3055d9e-2eb1-4d25-838e-2c70548f91b5'),\n",
|
||||
" UUID('4bf0613d-08d0-4fbc-a43c-4955e4c9e616'),\n",
|
||||
" UUID('18008625-8fd4-45c2-a0d7-92a2cde23dbc'),\n",
|
||||
" UUID('c712e06f-790b-4fd4-9040-7ab3898965d0'),\n",
|
||||
" UUID('a9b84820-3445-4810-a46c-e77b76ab85bc')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
"## Load a dataset"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dfce4edc",
|
||||
"id": "552e56b0-301a-4b06-99c7-57ba6faa966f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store\n",
|
||||
"\n",
|
||||
"We can delete items from our vector store by ID by using the `delete` function."
|
||||
"Convert each entry in the source dataset into a `Document`, then write them into the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "d3f69315",
|
||||
"execution_count": null,
|
||||
"id": "3a1f532f-ad63-4256-9730-a183841bd8e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 24,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store.delete(ids=uuids[-1])"
|
||||
"philo_dataset = load_dataset(\"datastax/philosopher-quotes\")[\"train\"]\n",
|
||||
"\n",
|
||||
"docs = []\n",
|
||||
"for entry in philo_dataset:\n",
|
||||
" metadata = {\"author\": entry[\"author\"]}\n",
|
||||
" doc = Document(page_content=entry[\"quote\"], metadata=metadata)\n",
|
||||
" docs.append(doc)\n",
|
||||
"\n",
|
||||
"inserted_ids = vstore.add_documents(docs)\n",
|
||||
"print(f\"\\nInserted {len(inserted_ids)} documents.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d12e1a07",
|
||||
"id": "79d4f436-ef04-4288-8f79-97c9abb983ed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"In the above, `metadata` dictionaries are created from the source data and are part of the `Document`.\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"#### Similarity search\n",
|
||||
"\n",
|
||||
"Performing a simple similarity search with filtering on metadata can be done as follows:"
|
||||
"_Note: check the [Astra DB API Docs](https://docs.datastax.com/en/astra-serverless/docs/develop/dev-with-json.html#_json_api_limits) for the valid metadata field names: some characters are reserved and cannot be used._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "084d8802-ab39-4262-9a87-42eafb746f92",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Add some more entries, this time with `add_texts`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "770b3467",
|
||||
"execution_count": null,
|
||||
"id": "b6b157f5-eb31-4907-a78e-2e2b06893936",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
|
||||
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
|
||||
" k=2,\n",
|
||||
" filter={\"source\": \"tweet\"},\n",
|
||||
"texts = [\"I think, therefore I am.\", \"To the things themselves!\"]\n",
|
||||
"metadatas = [{\"author\": \"descartes\"}, {\"author\": \"husserl\"}]\n",
|
||||
"ids = [\"desc_01\", \"huss_xy\"]\n",
|
||||
"\n",
|
||||
"inserted_ids_2 = vstore.add_texts(texts=texts, metadatas=metadatas, ids=ids)\n",
|
||||
"print(f\"\\nInserted {len(inserted_ids_2)} documents.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "63840eb3-8b29-4017-bc2f-301bf5001f28",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_Note: you may want to speed up the execution of `add_texts` and `add_documents` by increasing the concurrency level for_\n",
|
||||
"_these bulk operations - check out the `*_concurrency` parameters in the class constructor and the `add_texts` docstrings_\n",
|
||||
"_for more details. Depending on the network and the client machine specifications, your best-performing choice of parameters may vary._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c031760a-1fc5-4855-adf2-02ed52fe2181",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run searches"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02a77d8e-1aae-4054-8805-01c77947c49f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This section demonstrates metadata filtering and getting the similarity scores back:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1761806a-1afd-4491-867c-25a80d92b9fe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vstore.similarity_search(\"Our life is what we make of it\", k=3)\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eebc4f7c-f61a-438e-b3c8-17e6888d8a0b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results_filtered = vstore.similarity_search(\n",
|
||||
" \"Our life is what we make of it\",\n",
|
||||
" k=3,\n",
|
||||
" filter={\"author\": \"plato\"},\n",
|
||||
")\n",
|
||||
"for res in results_filtered:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "11bbfe64-c0cd-40c6-866a-a5786538450e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vstore.similarity_search_with_score(\"Our life is what we make of it\", k=3)\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b14ea558-bfbe-41ce-807e-d70670060ada",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### MMR (Maximal-marginal-relevance) search\n",
|
||||
"\n",
|
||||
"_Note: the MMR search method is not (yet) supported for vector stores built with Astra Vectorize._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "76381ce8-780a-4e3b-97b1-056d6782d7d5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vstore.max_marginal_relevance_search(\n",
|
||||
" \"Our life is what we make of it\",\n",
|
||||
" k=3,\n",
|
||||
" filter={\"author\": \"aristotle\"},\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
@@ -373,95 +420,133 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ce112165",
|
||||
"id": "60fda5df-14e4-4fb0-bd17-65a393fab8a9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Similarity search with score\n",
|
||||
"### Async\n",
|
||||
"\n",
|
||||
"You can also search with score:"
|
||||
"Note that the Astra DB vector store supports all fully async methods (`asimilarity_search`, `afrom_texts`, `adelete` and so on) natively, i.e. without thread wrapping involved."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1cc86edd-692b-4495-906c-ccfd13b03c23",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deleting stored documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "5924309a",
|
||||
"execution_count": null,
|
||||
"id": "38a70ec4-b522-4d32-9ead-c642864fca37",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* [SIM=0.776585] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\n",
|
||||
" \"Will it be hot tomorrow?\", k=1, filter={\"source\": \"news\"}\n",
|
||||
")\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fead7af5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Other search methods\n",
|
||||
"\n",
|
||||
"There are a variety of other search methods that are not covered in this notebook, such as MMR search or searching by vector. For a full list of the search abilities available for `AstraDBVectorStore` check out the [API reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7e40f714",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by turning into retriever\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains. \n",
|
||||
"\n",
|
||||
"Here is how to transform your vector store into a retriever and then invoke the retreiever with a simple query and filter."
|
||||
"delete_1 = vstore.delete(inserted_ids[:3])\n",
|
||||
"print(f\"all_succeed={delete_1}\") # True, all documents deleted"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "dcee50e6",
|
||||
"execution_count": null,
|
||||
"id": "d4cf49ed-9d29-4ed9-bdab-51a308c41b8e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(\n",
|
||||
" search_type=\"similarity_score_threshold\",\n",
|
||||
" search_kwargs={\"k\": 1, \"score_threshold\": 0.5},\n",
|
||||
")\n",
|
||||
"retriever.invoke(\"Stealing from the bank is a crime\", filter={\"source\": \"news\"})"
|
||||
"delete_2 = vstore.delete(inserted_ids[2:5])\n",
|
||||
"print(f\"some_succeeds={delete_2}\") # True, though some IDs were gone already"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "734e683a",
|
||||
"id": "847181ba-77d1-4a17-b7f9-9e2c3d8efd13",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"## A minimal RAG chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cd64b844-846f-43c5-a7dd-c26b9ed417d0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The next cells will implement a simple RAG pipeline:\n",
|
||||
"- download a sample PDF file and load it onto the store;\n",
|
||||
"- create a RAG chain with LCEL (LangChain Expression Language), with the vector store at its heart;\n",
|
||||
"- run the question-answering chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5cbc4dba-0d5e-4038-8fc5-de6cadd1c2a9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!curl -L \\\n",
|
||||
" \"https://github.com/awesome-astra/datasets/blob/main/demo-resources/what-is-philosophy/what-is-philosophy.pdf?raw=true\" \\\n",
|
||||
" -o \"what-is-philosophy.pdf\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "459385be-5e9c-47ff-ba53-2b7ae6166b09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pdf_loader = PyPDFLoader(\"what-is-philosophy.pdf\")\n",
|
||||
"splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)\n",
|
||||
"docs_from_pdf = pdf_loader.load_and_split(text_splitter=splitter)\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"print(f\"Documents from PDF: {len(docs_from_pdf)}.\")\n",
|
||||
"inserted_ids_from_pdf = vstore.add_documents(docs_from_pdf)\n",
|
||||
"print(f\"Inserted {len(inserted_ids_from_pdf)} documents.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5010a66c-4298-4e32-82b5-2da0d36a5c70",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = vstore.as_retriever(search_kwargs={\"k\": 3})\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
"philo_template = \"\"\"\n",
|
||||
"You are a philosopher that draws inspiration from great thinkers of the past\n",
|
||||
"to craft well-thought answers to user questions. Use the provided context as the basis\n",
|
||||
"for your answers and do not make up new reasoning paths - just mix-and-match what you are given.\n",
|
||||
"Your answers must be concise and to the point, and refrain from answering about other topics than philosophy.\n",
|
||||
"\n",
|
||||
"CONTEXT:\n",
|
||||
"{context}\n",
|
||||
"\n",
|
||||
"QUESTION: {question}\n",
|
||||
"\n",
|
||||
"YOUR ANSWER:\"\"\"\n",
|
||||
"\n",
|
||||
"philo_prompt = ChatPromptTemplate.from_template(philo_template)\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI()\n",
|
||||
"\n",
|
||||
"chain = (\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
|
||||
" | philo_prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fcbc1296-6c7c-478b-b55b-533ba4e54ddb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain.invoke(\"How does Russel elaborate on Peirce's idea of the security blanket?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -477,7 +562,7 @@
|
||||
"id": "177610c7-50d0-4b7b-8634-b03338054c8e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Cleanup vector store"
|
||||
"## Cleanup"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -497,17 +582,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store.delete_collection()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a14c34be",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `AstraDBVectorStore` features and configurations head to the API reference:https://api.python.langchain.com/en/latest/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html"
|
||||
"vstore.delete_collection()"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -527,7 +602,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -7,23 +7,30 @@
|
||||
"source": [
|
||||
"# Chroma\n",
|
||||
"\n",
|
||||
"This notebook covers how to get started with the `Chroma` vector store.\n",
|
||||
">[Chroma](https://docs.trychroma.com/getting-started) is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.\n",
|
||||
"\n",
|
||||
">[Chroma](https://docs.trychroma.com/getting-started) is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0. View the full docs of `Chroma` at [this page](https://docs.trychroma.com/reference/py-collection), and find the API reference for the LangChain integration at [this page](https://api.python.langchain.com/en/latest/vectorstores/langchain_chroma.vectorstores.Chroma.html).\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"Install Chroma with:\n",
|
||||
"\n",
|
||||
"To access `Chroma` vector stores you'll need to install the `langchain-chroma` integration package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "83a43688",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install -qU \"langchain-chroma>=0.1.2\""
|
||||
"```sh\n",
|
||||
"pip install langchain-chroma\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Chroma runs in various modes. See below for examples of each integrated with LangChain.\n",
|
||||
"- `in-memory` - in a python script or jupyter notebook\n",
|
||||
"- `in-memory with persistance` - in a script or notebook and save/load to disk\n",
|
||||
"- `in a docker container` - as a server running your local machine or in the cloud\n",
|
||||
"\n",
|
||||
"Like any other database, you can: \n",
|
||||
"- `.add` \n",
|
||||
"- `.get` \n",
|
||||
"- `.update`\n",
|
||||
"- `.upsert`\n",
|
||||
"- `.delete`\n",
|
||||
"- `.peek`\n",
|
||||
"- and `.query` runs the similarity search.\n",
|
||||
"\n",
|
||||
"View full docs at [docs](https://docs.trychroma.com/reference/py-collection). To access these methods directly, you can do `._collection.method()`\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -31,94 +38,149 @@
|
||||
"id": "2b5ffbf8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"## Basic Example\n",
|
||||
"\n",
|
||||
"You can use the `Chroma` vector store without any credentials, simply installing the package above is enough!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cd17cfed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dd7e1243",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f47f73f4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"### Basic Initialization \n",
|
||||
"\n",
|
||||
"Below is a basic initialization, including the use of a directory to save the data locally.\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
"In this basic example, we take the most recent State of the Union Address, split it into chunks, embed it using an open-source embedding model, load it into Chroma, and then query it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "d3ed0a9a",
|
||||
"id": "ae9fcf3e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "3ea11a7b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# import\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_community.embeddings.sentence_transformer import (\n",
|
||||
" SentenceTransformerEmbeddings,\n",
|
||||
")\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
"vector_store = Chroma(\n",
|
||||
" collection_name=\"example_collection\",\n",
|
||||
" embedding_function=embeddings,\n",
|
||||
" persist_directory=\"./chroma_langchain_db\", # Where to save data locally, remove if not neccesary\n",
|
||||
")"
|
||||
"# load the document and split it into chunks\n",
|
||||
"loader = TextLoader(\"../../how_to/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"\n",
|
||||
"# split it into chunks\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"# create the open-source embedding function\n",
|
||||
"embedding_function = SentenceTransformerEmbeddings(model_name=\"all-MiniLM-L6-v2\")\n",
|
||||
"\n",
|
||||
"# load it into Chroma\n",
|
||||
"db = Chroma.from_documents(docs, embedding_function)\n",
|
||||
"\n",
|
||||
"# query it\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = db.similarity_search(query)\n",
|
||||
"\n",
|
||||
"# print results\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ccb62a8c",
|
||||
"id": "5c9a11cc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Initialization from client\n",
|
||||
"## Basic Example (including saving to disk)\n",
|
||||
"\n",
|
||||
"You can also initialize from a `Chroma` client, which is particularly useful if you want easier access to the underlying database."
|
||||
"Extending the previous example, if you want to save to disk, simply initialize the Chroma client and pass the directory where you want the data to be saved to. \n",
|
||||
"\n",
|
||||
"`Caution`: Chroma makes a best-effort to automatically save data to disk, however multiple in-memory clients can stop each other's work. As a best practice, only have one client per path running at any given time."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "49f9bd49",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# save to disk\n",
|
||||
"db2 = Chroma.from_documents(docs, embedding_function, persist_directory=\"./chroma_db\")\n",
|
||||
"docs = db2.similarity_search(query)\n",
|
||||
"\n",
|
||||
"# load from disk\n",
|
||||
"db3 = Chroma(persist_directory=\"./chroma_db\", embedding_function=embedding_function)\n",
|
||||
"docs = db3.similarity_search(query)\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "63318cc9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Passing a Chroma Client into Langchain\n",
|
||||
"\n",
|
||||
"You can also create a Chroma Client and pass it to LangChain. This is particularly useful if you want easier access to the underlying database.\n",
|
||||
"\n",
|
||||
"You can also specify the collection name that you want LangChain to use."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "3fe4457f",
|
||||
"id": "22f4a0ce",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Add of existing embedding ID: 1\n",
|
||||
"Add of existing embedding ID: 2\n",
|
||||
"Add of existing embedding ID: 3\n",
|
||||
"Add of existing embedding ID: 1\n",
|
||||
"Add of existing embedding ID: 2\n",
|
||||
"Add of existing embedding ID: 3\n",
|
||||
"Add of existing embedding ID: 1\n",
|
||||
"Insert of existing embedding ID: 1\n",
|
||||
"Add of existing embedding ID: 2\n",
|
||||
"Insert of existing embedding ID: 2\n",
|
||||
"Add of existing embedding ID: 3\n",
|
||||
"Insert of existing embedding ID: 3\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"There are 3 in the collection\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import chromadb\n",
|
||||
"\n",
|
||||
@@ -126,320 +188,320 @@
|
||||
"collection = persistent_client.get_or_create_collection(\"collection_name\")\n",
|
||||
"collection.add(ids=[\"1\", \"2\", \"3\"], documents=[\"a\", \"b\", \"c\"])\n",
|
||||
"\n",
|
||||
"vector_store_from_client = Chroma(\n",
|
||||
"langchain_chroma = Chroma(\n",
|
||||
" client=persistent_client,\n",
|
||||
" collection_name=\"collection_name\",\n",
|
||||
" embedding_function=embeddings,\n",
|
||||
")"
|
||||
" embedding_function=embedding_function,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(\"There are\", langchain_chroma._collection.count(), \"in the collection\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9d037340",
|
||||
"id": "e9cf6d70",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"## Basic Example (using the Docker Container)\n",
|
||||
"\n",
|
||||
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
|
||||
"You can also run the Chroma Server in a Docker container separately, create a Client to connect to it, and then pass that to LangChain. \n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"Chroma has the ability to handle multiple `Collections` of documents, but the LangChain interface expects one, so we need to specify the collection name. The default collection name used by LangChain is \"langchain\".\n",
|
||||
"\n",
|
||||
"We can add items to our vector store by using the `add_documents` function."
|
||||
"Here is how to clone, build, and run the Docker Image:\n",
|
||||
"```sh\n",
|
||||
"git clone git@github.com:chroma-core/chroma.git\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Edit the `docker-compose.yml` file and add `ALLOW_RESET=TRUE` under `environment`\n",
|
||||
"```yaml\n",
|
||||
" ...\n",
|
||||
" command: uvicorn chromadb.app:app --reload --workers 1 --host 0.0.0.0 --port 8000 --log-config log_config.yml\n",
|
||||
" environment:\n",
|
||||
" - IS_PERSISTENT=TRUE\n",
|
||||
" - ALLOW_RESET=TRUE\n",
|
||||
" ports:\n",
|
||||
" - 8000:8000\n",
|
||||
" ...\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Then run `docker-compose up -d --build`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "da279339",
|
||||
"execution_count": 4,
|
||||
"id": "74aee70e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['f22ed484-6db3-4b76-adb1-18a777426cd6',\n",
|
||||
" 'e0d5bab4-6453-4511-9a37-023d9d288faa',\n",
|
||||
" '877d76b8-3580-4d9e-a13f-eed0fa3d134a',\n",
|
||||
" '26eaccab-81ce-4c0a-8e76-bf542647df18',\n",
|
||||
" 'bcaa8239-7986-4050-bf40-e14fb7dab997',\n",
|
||||
" 'cdc44b38-a83f-4e49-b249-7765b334e09d',\n",
|
||||
" 'a7a35354-2687-4bc2-8242-3849a4d18d34',\n",
|
||||
" '8780caf1-d946-4f27-a707-67d037e9e1d8',\n",
|
||||
" 'dec6af2a-7326-408f-893d-7d7d717dfda9',\n",
|
||||
" '3b18e210-bb59-47a0-8e17-c8e51176ea5e']"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"# create the chroma client\n",
|
||||
"import uuid\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"import chromadb\n",
|
||||
"from chromadb.config import Settings\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
" id=1,\n",
|
||||
"client = chromadb.HttpClient(settings=Settings(allow_reset=True))\n",
|
||||
"client.reset() # resets the database\n",
|
||||
"collection = client.create_collection(\"my_collection\")\n",
|
||||
"for doc in docs:\n",
|
||||
" collection.add(\n",
|
||||
" ids=[str(uuid.uuid1())], metadatas=doc.metadata, documents=doc.page_content\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"# tell LangChain to use our client and collection name\n",
|
||||
"db4 = Chroma(\n",
|
||||
" client=client,\n",
|
||||
" collection_name=\"my_collection\",\n",
|
||||
" embedding_function=embedding_function,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
" id=2,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
" id=3,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
" id=4,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
" id=5,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
" id=6,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
" id=7,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
" id=8,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
" id=9,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
" id=10,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = db4.similarity_search(query)\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7add6366",
|
||||
"id": "9ed3ec50",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Update items in vector store\n",
|
||||
"## Update and Delete\n",
|
||||
"\n",
|
||||
"Now that we have added documents to our vector store, we can update existing documents by using the `update_documents` function. "
|
||||
"While building toward a real application, you want to go beyond adding data, and also update and delete data. \n",
|
||||
"\n",
|
||||
"Chroma has users provide `ids` to simplify the bookkeeping here. `ids` can be the name of the file, or a combined has like `filename_paragraphNumber`, etc.\n",
|
||||
"\n",
|
||||
"Chroma supports all these operations - though some of them are still being integrated all the way through the LangChain interface. Additional workflow improvements will be added soon.\n",
|
||||
"\n",
|
||||
"Here is a basic example showing how to do various operations:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "ef5dbd1e",
|
||||
"id": "81a02810",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'source': '../../../state_of_the_union.txt'}\n",
|
||||
"{'ids': ['1'], 'embeddings': None, 'metadatas': [{'new_value': 'hello world', 'source': '../../../state_of_the_union.txt'}], 'documents': ['Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.']}\n",
|
||||
"count before 46\n",
|
||||
"count after 45\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"updated_document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and fried eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
" id=1,\n",
|
||||
")\n",
|
||||
"# create simple ids\n",
|
||||
"ids = [str(i) for i in range(1, len(docs) + 1)]\n",
|
||||
"\n",
|
||||
"updated_document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is sunny and warm, with a high of 82 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
" id=2,\n",
|
||||
")\n",
|
||||
"# add data\n",
|
||||
"example_db = Chroma.from_documents(docs, embedding_function, ids=ids)\n",
|
||||
"docs = example_db.similarity_search(query)\n",
|
||||
"print(docs[0].metadata)\n",
|
||||
"\n",
|
||||
"vector_store.update_document(document_id=uuids[0], document=updated_document_1)\n",
|
||||
"# You can also update multiple documents at once\n",
|
||||
"vector_store.update_documents(\n",
|
||||
" ids=uuids[:2], documents=[updated_document_1, updated_document_1]\n",
|
||||
")"
|
||||
"# update the metadata for a document\n",
|
||||
"docs[0].metadata = {\n",
|
||||
" \"source\": \"../../how_to/state_of_the_union.txt\",\n",
|
||||
" \"new_value\": \"hello world\",\n",
|
||||
"}\n",
|
||||
"example_db.update_document(ids[0], docs[0])\n",
|
||||
"print(example_db._collection.get(ids=[ids[0]]))\n",
|
||||
"\n",
|
||||
"# delete the last document\n",
|
||||
"print(\"count before\", example_db._collection.count())\n",
|
||||
"example_db._collection.delete(ids=[ids[-1]])\n",
|
||||
"print(\"count after\", example_db._collection.count())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "74b9a13a",
|
||||
"id": "ac6bc71a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store\n",
|
||||
"## Use OpenAI Embeddings\n",
|
||||
"\n",
|
||||
"We can also delete items from our vector store as follows:"
|
||||
"Many people like to use OpenAIEmbeddings, here is how to set that up."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "56f17791",
|
||||
"metadata": {},
|
||||
"id": "42080f37-8fd1-4cec-acd9-15d2b03b2f4d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store.delete(ids=uuids[-1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "213acf08",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"# get a token: https://platform.openai.com/account/api-keys\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"#### Similarity search\n",
|
||||
"\n",
|
||||
"Performing a simple similarity search can be done as follows:"
|
||||
"OPENAI_API_KEY = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "e2b96fcf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
|
||||
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"id": "c7a94d6c-b4d4-4498-9bdd-eb50c92b85c5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
|
||||
" k=2,\n",
|
||||
" filter={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cdd117ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Similarity search with score\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"If you want to execute a similarity search and receive the corresponding scores you can run:"
|
||||
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "2768a331",
|
||||
"metadata": {},
|
||||
"id": "5eabdb75",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* [SIM=1.726390] The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]\n"
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\n",
|
||||
" \"Will it be hot tomorrow?\", k=1, filter={\"source\": \"news\"}\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"new_client = chromadb.EphemeralClient()\n",
|
||||
"openai_lc_client = Chroma.from_documents(\n",
|
||||
" docs, embeddings, client=new_client, collection_name=\"openai_collection\"\n",
|
||||
")\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = openai_lc_client.similarity_search(query)\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "92b436c8",
|
||||
"id": "6d9c28ad",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Search by vector\n",
|
||||
"***\n",
|
||||
"\n",
|
||||
"You can also search by vector:"
|
||||
"## Other Information"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "18152965",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity search with score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "346347d7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The returned distance score is cosine distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "8ea434a5",
|
||||
"metadata": {},
|
||||
"id": "72aaa9c8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = db.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "d88e958e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* I had chocalate chip pancakes and fried eggs for breakfast this morning. [{'source': 'tweet'}]\n"
|
||||
]
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
|
||||
" 1.1972057819366455)"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_by_vector(\n",
|
||||
" embedding=embeddings.embed_query(\"I love green eggs and ham!\"), k=1\n",
|
||||
")\n",
|
||||
"for doc in results:\n",
|
||||
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9c1c1e6f",
|
||||
"id": "794a7552",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Other search methods\n",
|
||||
"### Retriever options\n",
|
||||
"\n",
|
||||
"There are a variety of other search methods that are not covered in this notebook, such as MMR search or searching by vector. For a full list of the search abilities available for `AstraDBVectorStore` check out the [API reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html).\n",
|
||||
"This section goes over different options for how to use Chroma as a retriever.\n",
|
||||
"\n",
|
||||
"### Query by turning into retriever\n",
|
||||
"#### MMR\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains. For more information on the different search types and kwargs you can pass, please visit the API reference [here](https://api.python.langchain.com/en/latest/vectorstores/langchain_chroma.vectorstores.Chroma.html#langchain_chroma.vectorstores.Chroma.as_retriever)."
|
||||
"In addition to using similarity search in the retriever object, you can also use `mmr`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "96ff911a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = db.as_retriever(search_type=\"mmr\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "7b6f7867",
|
||||
"id": "f00be6d0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
|
||||
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
@@ -448,34 +510,41 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(\n",
|
||||
" search_type=\"mmr\", search_kwargs={\"k\": 1, \"fetch_k\": 5}\n",
|
||||
")\n",
|
||||
"retriever.invoke(\"Stealing from the bank is a crime\", filter={\"source\": \"news\"})"
|
||||
"retriever.invoke(query)[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a2b7b73c",
|
||||
"id": "275dbd0a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"### Filtering on metadata\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"It can be helpful to narrow down the collection before working with it.\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
"For example, collections can be filtered on metadata using the get method."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fed28359",
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "81600dc1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'ids': [], 'embeddings': None, 'metadatas': [], 'documents': []}"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `Chroma` vector store features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_chroma.vectorstores.Chroma.html"
|
||||
"# filter collection for updated source\n",
|
||||
"example_db.get(where={\"source\": \"some_other_source\"})"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -495,7 +564,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.10.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -9,18 +9,37 @@
|
||||
"\n",
|
||||
"> [ClickHouse](https://clickhouse.com/) is the fastest and most resource efficient open-source database for real-time apps and analytics with full SQL support and a wide range of functions to assist users in writing analytical queries. Lately added data structures and distance search functions (like `L2Distance`) as well as [approximate nearest neighbor search indexes](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes) enable ClickHouse to be used as a high performance and scalable vector database to store and search vectors with SQL.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `ClickHouse` vector store.\n",
|
||||
"You'll need to install `langchain-community` with `pip install -qU langchain-community` to use this integration\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First set up a local clickhouse server with docker:"
|
||||
"This notebook shows how to use functionality related to the `ClickHouse` vector search."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "43ead5d5-2c1f-4dce-a69a-cb00e4f9d6f0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setting up environments"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b2c434bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Setting up local clickhouse server with docker (optional)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8c4d2e16",
|
||||
"metadata": {},
|
||||
"id": "249a7751",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:43:43.035606Z",
|
||||
"start_time": "2023-06-03T08:43:42.618531Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! docker run -d -p 8123:8123 -p9000:9000 --name langchain-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server:23.4.2.11"
|
||||
@@ -28,82 +47,52 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0acb2a8d",
|
||||
"id": "7bd3c1c0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You'll need to install `langchain-community` and `clickhouse-connect` to use this integration"
|
||||
"Setup up clickhouse client driver"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d454fb7c",
|
||||
"id": "9d614bf8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install -qU langchain-community clickhouse-connect"
|
||||
"%pip install --upgrade --quiet clickhouse-connect"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3df5501b",
|
||||
"id": "15a1d477-9cdb-4d82-b019-96951ecb2b72",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"There are no credentials for this notebook, just make sure you have installed the packages as shown above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "54d5276f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f6fd5b03",
|
||||
"metadata": {},
|
||||
"execution_count": 1,
|
||||
"id": "91003ea5-0c8c-436c-a5de-aaeaeef2f458",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:49:35.383673Z",
|
||||
"start_time": "2023-06-03T08:49:33.984547Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b87fe34",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
"if not os.environ[\"OPENAI_API_KEY\"]:\n",
|
||||
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "60276097",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 2,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -115,178 +104,176 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.vectorstores import Clickhouse, ClickhouseSettings\n",
|
||||
"\n",
|
||||
"settings = ClickhouseSettings(table=\"clickhouse_example\")\n",
|
||||
"vector_store = Clickhouse(embeddings, config=settings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "32dd3f67",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"\n",
|
||||
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"We can add items to our vector store by using the `add_documents` function."
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "944743ee",
|
||||
"metadata": {},
|
||||
"execution_count": 3,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:33:32.527387Z",
|
||||
"start_time": "2023-06-03T08:33:32.501312Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"loader = TextLoader(\"../../how_to/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "18af81cc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store\n",
|
||||
"\n",
|
||||
"We can delete items from our vector store by ID by using the `delete` function."
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "12b32762",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"execution_count": 4,
|
||||
"id": "6e104aee",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:33:35.503823Z",
|
||||
"start_time": "2023-06-03T08:33:33.745832Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Inserting data...: 100%|██████████| 42/42 [00:00<00:00, 2801.49it/s]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vector_store.delete(ids=uuids[-1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ada27577",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"for d in docs:\n",
|
||||
" d.metadata = {\"some\": \"metadata\"}\n",
|
||||
"settings = ClickhouseSettings(table=\"clickhouse_vector_search_example\")\n",
|
||||
"docsearch = Clickhouse.from_documents(docs, embeddings, config=settings)\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"#### Similarity search\n",
|
||||
"\n",
|
||||
"Performing a simple similarity search can be done as follows:"
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "015831a3",
|
||||
"execution_count": 5,
|
||||
"id": "9c608226",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\", k=2\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "623d3b9d",
|
||||
"id": "e3a8b105",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Similarity search with score\n",
|
||||
"\n",
|
||||
"You can also search with score:"
|
||||
"## Get connection info and data schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e7d43430",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"execution_count": 6,
|
||||
"id": "69996818",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:28:58.252991Z",
|
||||
"start_time": "2023-06-03T08:28:58.197560Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[92m\u001b[1mdefault.clickhouse_vector_search_example @ localhost:8123\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1musername: None\u001b[0m\n",
|
||||
"\n",
|
||||
"Table Schema:\n",
|
||||
"---------------------------------------------------\n",
|
||||
"|\u001b[94mid \u001b[0m|\u001b[96mNullable(String) \u001b[0m|\n",
|
||||
"|\u001b[94mdocument \u001b[0m|\u001b[96mNullable(String) \u001b[0m|\n",
|
||||
"|\u001b[94membedding \u001b[0m|\u001b[96mArray(Float32) \u001b[0m|\n",
|
||||
"|\u001b[94mmetadata \u001b[0m|\u001b[96mObject('json') \u001b[0m|\n",
|
||||
"|\u001b[94muuid \u001b[0m|\u001b[96mUUID \u001b[0m|\n",
|
||||
"---------------------------------------------------\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\"Will it be hot tomorrow?\", k=1)\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
"print(str(docsearch))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f5a90c12",
|
||||
"id": "324ac147",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Clickhouse table schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b5bd7c5b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> Clickhouse table will be automatically created if not exist by default. Advanced users could pre-create the table with optimized settings. For distributed Clickhouse cluster with sharding, table engine should be configured as `Distributed`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "54f4f561",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Clickhouse Table DDL:\n",
|
||||
"\n",
|
||||
"CREATE TABLE IF NOT EXISTS default.clickhouse_vector_search_example(\n",
|
||||
" id Nullable(String),\n",
|
||||
" document Nullable(String),\n",
|
||||
" embedding Array(Float32),\n",
|
||||
" metadata JSON,\n",
|
||||
" uuid UUID DEFAULT generateUUIDv4(),\n",
|
||||
" CONSTRAINT cons_vec_len CHECK length(embedding) = 1536,\n",
|
||||
" INDEX vec_idx embedding TYPE annoy(100,'L2Distance') GRANULARITY 1000\n",
|
||||
") ENGINE = MergeTree ORDER BY uuid SETTINGS index_granularity = 8192\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(f\"Clickhouse Table DDL:\\n\\n{docsearch.schema}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f59360c0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Filtering\n",
|
||||
@@ -300,87 +287,94 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "169d01d1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"execution_count": 9,
|
||||
"id": "232055f6",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:29:36.680805Z",
|
||||
"start_time": "2023-06-03T08:29:34.963676Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Inserting data...: 100%|██████████| 42/42 [00:00<00:00, 6939.56it/s]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"meta = vector_store.metadata_column\n",
|
||||
"results = vector_store.similarity_search_with_relevance_scores(\n",
|
||||
" \"What did I eat for breakfast?\",\n",
|
||||
" k=4,\n",
|
||||
" where_str=f\"{meta}.source = 'tweet'\",\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d86fa4bf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Other search methods\n",
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_community.vectorstores import Clickhouse, ClickhouseSettings\n",
|
||||
"\n",
|
||||
"There are a variety of other search methods that are not covered in this notebook, such as MMR search or searching by vector. For a full list of the search abilities available for `Clickhouse` vector store check out the [API reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.clickhouse.Clickhouse.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "afacfd4e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by turning into retriever\n",
|
||||
"loader = TextLoader(\"../../how_to/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains. \n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
"Here is how to transform your vector store into a retriever and then invoke the retreiever with a simple query and filter."
|
||||
"for i, d in enumerate(docs):\n",
|
||||
" d.metadata = {\"doc_id\": i}\n",
|
||||
"\n",
|
||||
"docsearch = Clickhouse.from_documents(docs, embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "97187188",
|
||||
"execution_count": 10,
|
||||
"id": "ddbcee77",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:29:43.487436Z",
|
||||
"start_time": "2023-06-03T08:29:43.040831Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"0.6779101415357189 {'doc_id': 0} Madam Speaker, Madam...\n",
|
||||
"0.6997970363474885 {'doc_id': 8} And so many families...\n",
|
||||
"0.7044504914336727 {'doc_id': 1} Groups of citizens b...\n",
|
||||
"0.7053558702165094 {'doc_id': 6} And I’m taking robus...\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"meta = docsearch.metadata_column\n",
|
||||
"output = docsearch.similarity_search_with_relevance_scores(\n",
|
||||
" \"What did the president say about Ketanji Brown Jackson?\",\n",
|
||||
" k=4,\n",
|
||||
" where_str=f\"{meta}.doc_id<10\",\n",
|
||||
")\n",
|
||||
"for d, dist in output:\n",
|
||||
" print(dist, d.metadata, d.page_content[:20] + \"...\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a359ed74",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deleting your data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "fb6a9d36",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:30:24.822384Z",
|
||||
"start_time": "2023-06-03T08:30:24.798571Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(\n",
|
||||
" search_type=\"similarity_score_threshold\",\n",
|
||||
" search_kwargs={\"k\": 1, \"score_threshold\": 0.5},\n",
|
||||
")\n",
|
||||
"retriever.invoke(\"Stealing from the bank is a crime\", filter={\"source\": \"news\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "57fade30",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "db24787c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For more, check out a complete RAG template using Astra DB [here](https://github.com/langchain-ai/langchain/tree/master/templates/rag-astradb)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02452d34",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `AstraDBVectorStore` features and configurations head to the API reference:https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.clickhouse.Clickhouse.html"
|
||||
"docsearch.drop()"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -400,7 +394,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
"\n",
|
||||
"Vector Search is a part of the [Full Text Search Service](https://docs.couchbase.com/server/current/learn/services-and-indexes/services/search-service.html) (Search Service) in Couchbase.\n",
|
||||
"\n",
|
||||
"This tutorial explains how to use Vector Search in Couchbase. You can work with either [Couchbase Capella](https://www.couchbase.com/products/capella/) and your self-managed Couchbase Server."
|
||||
"This tutorial explains how to use Vector Search in Couchbase. You can work with both [Couchbase Capella](https://www.couchbase.com/products/capella/) and your self-managed Couchbase Server."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -18,64 +18,30 @@
|
||||
"id": "43326be4-4433-4de2-ad42-6eb91a722bad",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"To access the `CouchbaseVectorStore` you first need to install the `langchain-couchbase` partner package:"
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"id": "bec8d532-fec7-4dc7-9be3-020aa7bdb01f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install -qU langchain-couchbase"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "30d6861e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"Head over to the Couchbase [website](https://cloud.couchbase.com) and create a new connection, making sure to save your database username and password:"
|
||||
"%pip install --upgrade --quiet langchain langchain-openai langchain-couchbase"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d98e3baa",
|
||||
"execution_count": 2,
|
||||
"id": "4a972cbc-bf59-46eb-9b50-e5dc3a69dcf0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"COUCHBASE_CONNECTION_STRING = getpass.getpass(\n",
|
||||
" \"Enter the connection string for the Couchbase cluster: \"\n",
|
||||
")\n",
|
||||
"DB_USERNAME = getpass.getpass(\"Enter the username for the Couchbase cluster: \")\n",
|
||||
"DB_PASSWORD = getpass.getpass(\"Enter the password for the Couchbase cluster: \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "23ac2c64",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9c25ec38",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
||||
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -83,9 +49,18 @@
|
||||
"id": "acf1b168-622f-465c-a9a5-d27a6d7e7a8f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"Before instantiating we need to create a connection."
|
||||
"## Import the Vector Store and Embeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "23ce45ab-bfd2-42e1-b681-514a550f0232",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_couchbase.vectorstores import CouchbaseVectorStore\n",
|
||||
"from langchain_openai import OpenAIEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -93,18 +68,31 @@
|
||||
"id": "3144ba02-1eaa-4449-853e-f034ca5706bf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create Couchbase Connection Object\n",
|
||||
"\n",
|
||||
"## Create Couchbase Connection Object\n",
|
||||
"We create a connection to the Couchbase cluster initially and then pass the cluster object to the Vector Store. \n",
|
||||
"\n",
|
||||
"Here, we are connecting using the username and password from above. You can also connect using any other supported way to your cluster. \n",
|
||||
"Here, we are connecting using the username and password. You can also connect using any other supported way to your cluster. \n",
|
||||
"\n",
|
||||
"For more information on connecting to the Couchbase cluster, please check the [documentation](https://docs.couchbase.com/python-sdk/current/hello-world/start-using-sdk.html#connect)."
|
||||
"For more information on connecting to the Couchbase cluster, please check the [Python SDK documentation](https://docs.couchbase.com/python-sdk/current/hello-world/start-using-sdk.html#connect)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 4,
|
||||
"id": "52fe583a-12db-4dc2-9281-1174bf1d4e5c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"COUCHBASE_CONNECTION_STRING = (\n",
|
||||
" \"couchbase://localhost\" # or \"couchbases://localhost\" if using TLS\n",
|
||||
")\n",
|
||||
"DB_USERNAME = \"Administrator\"\n",
|
||||
"DB_PASSWORD = \"Password\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "9986c6b9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -135,15 +123,145 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 6,
|
||||
"id": "1b1d0a26-e9d4-4823-9800-9549d24d3d16",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"BUCKET_NAME = \"langchain_bucket\"\n",
|
||||
"BUCKET_NAME = \"testing\"\n",
|
||||
"SCOPE_NAME = \"_default\"\n",
|
||||
"COLLECTION_NAME = \"default\"\n",
|
||||
"SEARCH_INDEX_NAME = \"langchain-test-index\""
|
||||
"COLLECTION_NAME = \"_default\"\n",
|
||||
"SEARCH_INDEX_NAME = \"vector-index\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "efbac6ff-c2ac-4443-9250-7cc88061346b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For this tutorial, we will use OpenAI embeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "87625579-86d7-4de4-8a4d-cee674a6b676",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3677b4b0-3711-419c-89ff-32ef4d3e3022",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create the Search Index\n",
|
||||
"Currently, the Search index needs to be created from the Couchbase Capella or Server UI or using the REST interface. \n",
|
||||
"\n",
|
||||
"Let us define a Search index with the name `vector-index` on the testing bucket\n",
|
||||
"\n",
|
||||
"For this example, let us use the Import Index feature on the Search Service on the UI. \n",
|
||||
"\n",
|
||||
"We are defining an index on the `testing` bucket's `_default` scope on the `_default` collection with the vector field set to `embedding` with 1536 dimensions and the text field set to `text`. We are also indexing and storing all the fields under `metadata` in the document as a dynamic mapping to account for varying document structures. The similarity metric is set to `dot_product`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "655117ae-9b1f-4139-b437-ca7685975a54",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### How to Import an Index to the Full Text Search service?\n",
|
||||
" - [Couchbase Server](https://docs.couchbase.com/server/current/search/import-search-index.html)\n",
|
||||
" - Click on Search -> Add Index -> Import\n",
|
||||
" - Copy the following Index definition in the Import screen\n",
|
||||
" - Click on Create Index to create the index.\n",
|
||||
" - [Couchbase Capella](https://docs.couchbase.com/cloud/search/import-search-index.html)\n",
|
||||
" - Copy the index definition to a new file `index.json`\n",
|
||||
" - Import the file in Capella using the instructions in the documentation.\n",
|
||||
" - Click on Create Index to create the index.\n",
|
||||
" \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f85bc468-d9b8-487d-999a-3b5d2fb78e41",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Index Definition\n",
|
||||
"```\n",
|
||||
"{\n",
|
||||
" \"name\": \"vector-index\",\n",
|
||||
" \"type\": \"fulltext-index\",\n",
|
||||
" \"params\": {\n",
|
||||
" \"doc_config\": {\n",
|
||||
" \"docid_prefix_delim\": \"\",\n",
|
||||
" \"docid_regexp\": \"\",\n",
|
||||
" \"mode\": \"type_field\",\n",
|
||||
" \"type_field\": \"type\"\n",
|
||||
" },\n",
|
||||
" \"mapping\": {\n",
|
||||
" \"default_analyzer\": \"standard\",\n",
|
||||
" \"default_datetime_parser\": \"dateTimeOptional\",\n",
|
||||
" \"default_field\": \"_all\",\n",
|
||||
" \"default_mapping\": {\n",
|
||||
" \"dynamic\": true,\n",
|
||||
" \"enabled\": true,\n",
|
||||
" \"properties\": {\n",
|
||||
" \"metadata\": {\n",
|
||||
" \"dynamic\": true,\n",
|
||||
" \"enabled\": true\n",
|
||||
" },\n",
|
||||
" \"embedding\": {\n",
|
||||
" \"enabled\": true,\n",
|
||||
" \"dynamic\": false,\n",
|
||||
" \"fields\": [\n",
|
||||
" {\n",
|
||||
" \"dims\": 1536,\n",
|
||||
" \"index\": true,\n",
|
||||
" \"name\": \"embedding\",\n",
|
||||
" \"similarity\": \"dot_product\",\n",
|
||||
" \"type\": \"vector\",\n",
|
||||
" \"vector_index_optimized_for\": \"recall\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" },\n",
|
||||
" \"text\": {\n",
|
||||
" \"enabled\": true,\n",
|
||||
" \"dynamic\": false,\n",
|
||||
" \"fields\": [\n",
|
||||
" {\n",
|
||||
" \"index\": true,\n",
|
||||
" \"name\": \"text\",\n",
|
||||
" \"store\": true,\n",
|
||||
" \"type\": \"text\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" \"default_type\": \"_default\",\n",
|
||||
" \"docvalues_dynamic\": false,\n",
|
||||
" \"index_dynamic\": true,\n",
|
||||
" \"store_dynamic\": true,\n",
|
||||
" \"type_field\": \"_type\"\n",
|
||||
" },\n",
|
||||
" \"store\": {\n",
|
||||
" \"indexType\": \"scorch\",\n",
|
||||
" \"segmentVersion\": 16\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" \"sourceType\": \"gocbcore\",\n",
|
||||
" \"sourceName\": \"testing\",\n",
|
||||
" \"sourceParams\": {},\n",
|
||||
" \"planParams\": {\n",
|
||||
" \"maxPartitionsPerPIndex\": 103,\n",
|
||||
" \"indexPartitions\": 10,\n",
|
||||
" \"numReplicas\": 0\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -151,7 +269,7 @@
|
||||
"id": "556dc68c-9089-4390-8dc9-b77051e7fc34",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For details on how to create a Search index with support for Vector fields, please refer to the documentation.\n",
|
||||
"For more details on how to create a Search index with support for Vector fields, please refer to the documentation.\n",
|
||||
"\n",
|
||||
"- [Couchbase Capella](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html)\n",
|
||||
" \n",
|
||||
@@ -163,40 +281,17 @@
|
||||
"id": "75f4037d-e509-4de7-a8d1-63a05de24e9d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Simple Instantiation\n",
|
||||
"\n",
|
||||
"Below, we create the vector store object with the cluster information and the search index name. \n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
"## Create Vector Store\n",
|
||||
"We create the vector store object with the cluster information and the search index name."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6706efdd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 8,
|
||||
"id": "33db4670-76c5-49ba-94d6-a8fa35583058",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_couchbase.vectorstores import CouchbaseVectorStore\n",
|
||||
"\n",
|
||||
"vector_store = CouchbaseVectorStore(\n",
|
||||
" cluster=cluster,\n",
|
||||
" bucket_name=BUCKET_NAME,\n",
|
||||
@@ -213,18 +308,9 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Specify the Text & Embeddings Field\n",
|
||||
"\n",
|
||||
"You can optionally specify the text & embeddings field for the document using the `text_key` and `embedding_key` fields."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "49c38634",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store_specific = CouchbaseVectorStore(\n",
|
||||
"You can optionally specify the text & embeddings field for the document using the `text_key` and `embedding_key` fields.\n",
|
||||
"```\n",
|
||||
"vector_store = CouchbaseVectorStore(\n",
|
||||
" cluster=cluster,\n",
|
||||
" bucket_name=BUCKET_NAME,\n",
|
||||
" scope_name=SCOPE_NAME,\n",
|
||||
@@ -233,148 +319,73 @@
|
||||
" index_name=SEARCH_INDEX_NAME,\n",
|
||||
" text_key=\"text\",\n",
|
||||
" embedding_key=\"embedding\",\n",
|
||||
")\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "790dc1ac-0ab8-4cb5-989d-31ca7c241068",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Basic Vector Search Example\n",
|
||||
"For this example, we are going to load the \"state_of_the_union.txt\" file via the TextLoader, chunk the text into 500 character chunks with no overlaps and index all these chunks into Couchbase.\n",
|
||||
"\n",
|
||||
"After the data is indexed, we perform a simple query to find the top 4 chunks that are similar to the query \"What did president say about Ketanji Brown Jackson\".\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "440350df-cbc6-48f7-8009-2e783be18306",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../how_to/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "9d3b4c7c-abd6-4dfa-ad63-470f16661319",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store = CouchbaseVectorStore.from_documents(\n",
|
||||
" documents=docs,\n",
|
||||
" embedding=embeddings,\n",
|
||||
" cluster=cluster,\n",
|
||||
" bucket_name=BUCKET_NAME,\n",
|
||||
" scope_name=SCOPE_NAME,\n",
|
||||
" collection_name=COLLECTION_NAME,\n",
|
||||
" index_name=SEARCH_INDEX_NAME,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "50e95fa6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"\n",
|
||||
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"We can add items to our vector store by using the `add_documents` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "65a35f00",
|
||||
"execution_count": 11,
|
||||
"id": "91fdce6c-8f7c-4060-865a-2fd742846664",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'source': '../../how_to/state_of_the_union.txt'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dd33b030",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3a05f294",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store.delete(ids=[uuids[-1]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d2cc4126",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.\n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"#### Similarity search\n",
|
||||
"\n",
|
||||
"Performing a simple similarity search can be done as follows:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8e00bb23",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
|
||||
" k=2,\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
"query = \"What did president say about Ketanji Brown Jackson\"\n",
|
||||
"results = vector_store.similarity_search(query)\n",
|
||||
"print(results[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -382,21 +393,31 @@
|
||||
"id": "d9b46c93-65f6-4e4f-87a2-5cebea3b7a6b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Similarity search with Score\n",
|
||||
"\n",
|
||||
"You can also fetch the scores for the results by calling the `similarity_search_with_score` method."
|
||||
"## Similarity Search with Score\n",
|
||||
"You can fetch the scores for the results by calling the `similarity_search_with_score` method."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 12,
|
||||
"id": "24b146b2-55a2-4fe8-8659-3649032f5dc7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'source': '../../how_to/state_of_the_union.txt'}\n",
|
||||
"Score: 0.8211871385574341\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\"Will it be hot tomorrow?\", k=1)\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
"query = \"What did president say about Ketanji Brown Jackson\"\n",
|
||||
"results = vector_store.similarity_search_with_score(query)\n",
|
||||
"document, score = results[0]\n",
|
||||
"print(document)\n",
|
||||
"print(f\"Score: {score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -404,8 +425,7 @@
|
||||
"id": "9983e83d-efd0-4b75-80db-150e0694e822",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Specifying Fields to Return\n",
|
||||
"\n",
|
||||
"## Specifying Fields to Return\n",
|
||||
"You can specify the fields to return from the document using `fields` parameter in the searches. These fields are returned as part of the `metadata` object in the returned Document. You can fetch any field that is stored in the Search index. The `text_key` of the document is returned as part of the document's `page_content`.\n",
|
||||
"\n",
|
||||
"If you do not specify any fields to be fetched, all the fields stored in the index are returned.\n",
|
||||
@@ -417,12 +437,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 13,
|
||||
"id": "ffa743dc-4e89-405b-ad71-7390338889e6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'source': '../../how_to/state_of_the_union.txt'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"What did I eat for breakfast today?\"\n",
|
||||
"query = \"What did president say about Ketanji Brown Jackson\"\n",
|
||||
"results = vector_store.similarity_search(query, fields=[\"metadata.source\"])\n",
|
||||
"print(results[0])"
|
||||
]
|
||||
@@ -432,8 +460,7 @@
|
||||
"id": "a5e45eb2-aa97-45df-bcc5-410e9626e506",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Hybrid Queries\n",
|
||||
"\n",
|
||||
"## Hybrid Search\n",
|
||||
"Couchbase allows you to do hybrid searches by combining Vector Search results with searches on non-vector fields of the document like the `metadata` object. \n",
|
||||
"\n",
|
||||
"The results will be based on the combination of the results from both Vector Search and the searches supported by Search Service. The scores of each of the component searches are added up to get the total score of the result.\n",
|
||||
@@ -447,26 +474,26 @@
|
||||
"id": "a5db3685-1918-4c63-8148-0bb3a71ea677",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Create Diverse Metadata for Hybrid Search\n",
|
||||
"### Create Diverse Metadata for Hybrid Search\n",
|
||||
"In order to simulate hybrid search, let us create some random metadata from the existing documents. \n",
|
||||
"We uniformly add three fields to the metadata, `date` between 2010 & 2020, `rating` between 1 & 5 and `author` set to either John Doe or Jane Doe. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 14,
|
||||
"id": "7d2e607d-6bbc-4cef-83e3-b6a28bb269ea",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'author': 'John Doe', 'date': '2016-01-01', 'rating': 2, 'source': '../../how_to/state_of_the_union.txt'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../how_to/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"# Adding metadata to documents\n",
|
||||
"for i, doc in enumerate(docs):\n",
|
||||
" doc.metadata[\"date\"] = f\"{range(2010, 2020)[i % 10]}-01-01\"\n",
|
||||
@@ -485,16 +512,24 @@
|
||||
"id": "6cad893b-3977-4556-ab1d-d12bce68b306",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by Exact Value\n",
|
||||
"### Example: Search by Exact Value\n",
|
||||
"We can search for exact matches on a textual field like the author in the `metadata` object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 15,
|
||||
"id": "dc06ba4a-8a6b-4c55-bb69-95cd92db273f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='This is personal to me and Jill, to Kamala, and to so many of you. \\n\\nCancer is the #2 cause of death in America–second only to heart disease. \\n\\nLast month, I announced our plan to supercharge \\nthe Cancer Moonshot that President Obama asked me to lead six years ago. \\n\\nOur goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases. \\n\\nMore support for patients and families.' metadata={'author': 'John Doe'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"results = vector_store.similarity_search(\n",
|
||||
@@ -510,7 +545,7 @@
|
||||
"id": "9106b594-b41e-4329-b98c-9b9f8a34d6f7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by Partial Match\n",
|
||||
"### Example: Search by Partial Match\n",
|
||||
"We can search for partial matches by specifying a fuzziness for the search. This is useful when you want to search for slight variations or misspellings of a search query.\n",
|
||||
"\n",
|
||||
"Here, \"Jae\" is close (fuzziness of 1) to \"Jane\"."
|
||||
@@ -518,10 +553,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 16,
|
||||
"id": "fd4749e6-ef4f-4cb5-95ff-37c4fa8283d8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.' metadata={'author': 'Jane Doe'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"results = vector_store.similarity_search(\n",
|
||||
@@ -539,16 +582,24 @@
|
||||
"id": "1bbf9449-6e30-4bd1-9eeb-f3b60952fcab",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by Date Range Query\n",
|
||||
"### Example: Search by Date Range Query\n",
|
||||
"We can search for documents that are within a date range query on a date field like `metadata.date`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 17,
|
||||
"id": "b7b47e7d-c32f-4999-bce9-3c3c3cebffd0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \\n\\nWe meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \\n\\nThe pandemic has been punishing. \\n\\nAnd so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \\n\\nI understand.' metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': '../../how_to/state_of_the_union.txt'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"Any mention about independence?\"\n",
|
||||
"results = vector_store.similarity_search(\n",
|
||||
@@ -571,16 +622,24 @@
|
||||
"id": "a18d4ea2-bfab-4f15-9839-674faf1c6f0d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by Numeric Range Query\n",
|
||||
"### Example: Search by Numeric Range Query\n",
|
||||
"We can search for documents that are within a range for a numeric field like `metadata.rating`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 18,
|
||||
"id": "7e8bf7c5-07d1-4c3f-86d7-1fa3a454dc7f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"(Document(page_content='He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \\n\\nWe meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \\n\\nThe pandemic has been punishing. \\n\\nAnd so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \\n\\nI understand.', metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': '../../how_to/state_of_the_union.txt'}), 0.9000703597577832)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"Any mention about independence?\"\n",
|
||||
"results = vector_store.similarity_search_with_score(\n",
|
||||
@@ -603,7 +662,7 @@
|
||||
"id": "0f16bf86-f01c-4a77-8406-275f7313f493",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Combining Multiple Search Queries\n",
|
||||
"### Example: Combining Multiple Search Queries\n",
|
||||
"Different search queries can be combined using AND (conjuncts) or OR (disjuncts) operators.\n",
|
||||
"\n",
|
||||
"In this example, we are checking for documents with a rating between 3 & 4 and dated between 2015 & 2018."
|
||||
@@ -611,10 +670,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 19,
|
||||
"id": "dd0fe7f1-aa40-4c6f-889b-99ad5efcd88b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"(Document(page_content='He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \\n\\nWe meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \\n\\nThe pandemic has been punishing. \\n\\nAnd so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \\n\\nI understand.', metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': '../../how_to/state_of_the_union.txt'}), 1.3598770370389914)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"Any mention about independence?\"\n",
|
||||
"results = vector_store.similarity_search_with_score(\n",
|
||||
@@ -643,46 +710,6 @@
|
||||
"- [Couchbase Server](https://docs.couchbase.com/server/current/search/search-request-params.html#query-object)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "db0a1d74",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by turning into retriever\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains. \n",
|
||||
"\n",
|
||||
"Here is how to transform your vector store into a retriever and then invoke the retreiever with a simple query and filter."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3666265a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(\n",
|
||||
" search_type=\"similarity_score_threshold\",\n",
|
||||
" search_kwargs={\"k\": 1, \"score_threshold\": 0.5},\n",
|
||||
")\n",
|
||||
"retriever.invoke(\"Stealing from the bank is a crime\", filter={\"source\": \"news\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "28ab35ec",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "80958c2b-6a67-45e6-b7f0-fd2461d75e0f",
|
||||
@@ -734,16 +761,6 @@
|
||||
"* [Couchbase Capella](https://docs.couchbase.com/cloud/search/create-child-mapping.html)\n",
|
||||
"* [Couchbase Server](https://docs.couchbase.com/server/current/search/create-child-mapping.html)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d876b769",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `CouchbaseVectorStore` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_couchbase.vectorstores.CouchbaseVectorStore.html"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -762,7 +779,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.10.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -7,9 +7,11 @@
|
||||
"source": [
|
||||
"# Faiss\n",
|
||||
"\n",
|
||||
">[Facebook AI Similarity Search (FAISS)](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.\n",
|
||||
">[Facebook AI Similarity Search (Faiss)](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.\n",
|
||||
"\n",
|
||||
"You can find the FAISS documentation at [this page](https://faiss.ai/).\n",
|
||||
"[Faiss documentation](https://faiss.ai/).\n",
|
||||
"\n",
|
||||
"You'll need to install `langchain-community` with `pip install -qU langchain-community` to use this integration\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `FAISS` vector database. It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/how_to#qa-with-rag) to learn how to use this vectorstore as part of a larger chain."
|
||||
]
|
||||
@@ -23,19 +25,28 @@
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"The integration lives in the `langchain-community` package. We also need to install the `faiss` package itself. We can install these with:\n",
|
||||
"The integration lives in the `langchain-community` package. We also need to install the `faiss` package itself. We will also be using OpenAI for embeddings, so we need to install those requirements. We can install these with:\n",
|
||||
"\n",
|
||||
"Note that you can also install `faiss-gpu` if you want to use the GPU enabled version"
|
||||
"```bash\n",
|
||||
"pip install -U langchain-community faiss-cpu langchain-openai tiktoken\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Note that you can also install `faiss-gpu` if you want to use the GPU enabled version\n",
|
||||
"\n",
|
||||
"Since we are using OpenAI, you will need an OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "08165d56",
|
||||
"id": "23984e60-c29a-461a-be2b-219108ac37ee",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install -qU langchain-community faiss-cpu"
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -43,7 +54,7 @@
|
||||
"id": "408be78f-7b0e-44d4-8d48-56a6cb9b3fb9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
"It's also helpful (but not needed) to set up [LangSmith](https://smith.langchain.com/) for best-in-class observability"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -62,311 +73,200 @@
|
||||
"id": "78dde98a-584f-4f2a-98d5-e776fd9558fa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"## Ingestion\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
"Here, we ingest documents into the vectorstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "5b394da3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"42"
|
||||
]
|
||||
},
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"# Uncomment the following line if you need to initialize FAISS with no AVX2 optimization\n",
|
||||
"# os.environ['FAISS_NO_AVX2'] = '1'\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_community.vectorstores import FAISS\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../how_to/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"db = FAISS.from_documents(docs, embeddings)\n",
|
||||
"print(db.index.ntotal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ecdd7a65-f310-4b36-bc1e-2a39dfd58d5f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying\n",
|
||||
"\n",
|
||||
"Now, we can query the vectorstore. There a few methods to do this. The most standard is to use `similarity_search`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
|
||||
"id": "5eabdb75",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import faiss\n",
|
||||
"from langchain_community.docstore.in_memory import InMemoryDocstore\n",
|
||||
"from langchain_community.vectorstores import FAISS\n",
|
||||
"\n",
|
||||
"index = faiss.IndexFlatL2(len(embeddings.embed_query(\"hello world\")))\n",
|
||||
"\n",
|
||||
"vector_store = FAISS(\n",
|
||||
" embedding_function=embeddings,\n",
|
||||
" index=index,\n",
|
||||
" docstore=InMemoryDocstore(),\n",
|
||||
" index_to_docstore_id={},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d8761614",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"\n",
|
||||
"### Add items to vector store"
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "3867e154",
|
||||
"metadata": {},
|
||||
"id": "4b172de8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['22f5ce99-cd6f-4e0c-8dab-664128307c72',\n",
|
||||
" 'dc3f061b-5f88-4fa1-a966-413550c51891',\n",
|
||||
" 'd33d890b-baad-47f7-b7c1-175f5f7b4e59',\n",
|
||||
" '6e6c01d2-6020-4a7b-95da-ef43d43f01b5',\n",
|
||||
" 'e677223d-ad75-4c1a-bef6-b5912bd1de03',\n",
|
||||
" '47e2a168-6462-4ed2-b1d9-d9edfd7391d6',\n",
|
||||
" '1e4d66d6-e155-4891-9212-f7be97f36c6a',\n",
|
||||
" 'c0663096-e1a5-4665-b245-1c2e6c4fb653',\n",
|
||||
" '8297474a-7f7c-4006-9865-398c1781b1bc',\n",
|
||||
" '44e4be03-0a8d-4316-b3c4-f35f4bb2b532']"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a410a2dc",
|
||||
"id": "6d9286c2-0802-4f02-8f9a-9f7fae7c79b0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store"
|
||||
"## As a Retriever\n",
|
||||
"\n",
|
||||
"We can also convert the vectorstore into a [Retriever](/docs/how_to#retrievers) class. This allows us to easily use it in other LangChain methods, which largely work with retrievers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "c3db04bd",
|
||||
"id": "6e91b475-3878-44e0-8720-98d903754b46",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store.delete(ids=[uuids[-1]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "77de24ff",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"#### Similarity search\n",
|
||||
"\n",
|
||||
"Performing a simple similarity search with filtering on metadata can be done as follows:"
|
||||
"retriever = db.as_retriever()\n",
|
||||
"docs = retriever.invoke(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "53d95d3f",
|
||||
"id": "046739d2-91fe-4101-8b72-c0bcdd9e02b9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
|
||||
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
|
||||
" k=2,\n",
|
||||
" filter={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5ae35069",
|
||||
"id": "f13473b5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Similarity search with score\n",
|
||||
"\n",
|
||||
"You can also search with score:"
|
||||
"## Similarity Search with score\n",
|
||||
"There are some FAISS specific methods. One of them is `similarity_search_with_score`, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "a9078ce9",
|
||||
"id": "186ee1d8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* [SIM=0.893688] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\n",
|
||||
" \"Will it be hot tomorrow?\", k=1, filter={\"source\": \"news\"}\n",
|
||||
")\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e9091b1f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Other search methods\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"There are a variety of other ways to search a FAISS vector store. For a complete list of those methods, please refer to the [API Reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.faiss.FAISS.html)\n",
|
||||
"\n",
|
||||
"### Query by turning into retriever\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains. "
|
||||
"docs_and_scores = db.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "10da64fa",
|
||||
"id": "284e04b5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
|
||||
"(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../how_to/state_of_the_union.txt'}),\n",
|
||||
" 0.36913747)"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 1})\n",
|
||||
"retriever.invoke(\"Stealing from the bank is a crime\", filter={\"source\": \"news\"})"
|
||||
"docs_and_scores[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5edd1909",
|
||||
"id": "f34420cf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
"It is also possible to do a search for documents similar to a given embedding vector using `similarity_search_by_vector` which accepts an embedding vector as a parameter instead of a string."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "b558ebb7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embedding_vector = embeddings.embed_query(query)\n",
|
||||
"docs_and_scores = db.similarity_search_by_vector(embedding_vector)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -380,33 +280,31 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": null,
|
||||
"id": "1b31fe27-e0b3-42c6-b17c-8270b517ee1f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store.save_local(\"faiss_index\")\n",
|
||||
"db.save_local(\"faiss_index\")\n",
|
||||
"\n",
|
||||
"new_vector_store = FAISS.load_local(\n",
|
||||
" \"faiss_index\", embeddings, allow_dangerous_deserialization=True\n",
|
||||
")\n",
|
||||
"new_db = FAISS.load_local(\"faiss_index\", embeddings)\n",
|
||||
"\n",
|
||||
"docs = new_vector_store.similarity_search(\"qux\")"
|
||||
"docs = new_db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 10,
|
||||
"id": "98378c4e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!')"
|
||||
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -415,6 +313,33 @@
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "30c8f57b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Serializing and De-Serializing to bytes\n",
|
||||
"\n",
|
||||
"you can pickle the FAISS Index by these functions. If you use embeddings model which is of 90 mb (sentence-transformers/all-MiniLM-L6-v2 or any other model), the resultant pickle size would be more than 90 mb. the size of the model is also included in the overall size. To overcome this, use the below functions. These functions only serializes FAISS index and size would be much lesser. this can be helpful if you wish to store the index in database like sql."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d8faead5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_huggingface import HuggingFaceEmbeddings\n",
|
||||
"\n",
|
||||
"pkl = db.serialize_to_bytes() # serializes the faiss\n",
|
||||
"embeddings = HuggingFaceEmbeddings(model_name=\"all-MiniLM-L6-v2\")\n",
|
||||
"\n",
|
||||
"db = FAISS.deserialize_from_bytes(\n",
|
||||
" embeddings=embeddings, serialized=pkl\n",
|
||||
") # Load the index"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "57da60d4",
|
||||
@@ -426,21 +351,10 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": null,
|
||||
"id": "9b8f5e31-3f40-4e94-8d97-5883125efba7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'b752e805-350e-4cf5-ba54-0883d46a3a44': Document(page_content='foo')}"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db1 = FAISS.from_texts([\"foo\"], embeddings)\n",
|
||||
"db2 = FAISS.from_texts([\"bar\"], embeddings)\n",
|
||||
@@ -450,17 +364,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 11,
|
||||
"id": "83392605",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'08192d92-746d-4cd1-b681-bdfba411f459': Document(page_content='bar')}"
|
||||
"{'807e0c63-13f6-4070-9774-5c6f0fbb9866': Document(page_content='bar', metadata={})}"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -471,7 +385,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 12,
|
||||
"id": "a3fcc1c7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -481,18 +395,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 13,
|
||||
"id": "41c51f89",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'b752e805-350e-4cf5-ba54-0883d46a3a44': Document(page_content='foo'),\n",
|
||||
" '08192d92-746d-4cd1-b681-bdfba411f459': Document(page_content='bar')}"
|
||||
"{'068c473b-d420-487a-806b-fb0ccea7f711': Document(page_content='foo', metadata={}),\n",
|
||||
" '807e0c63-13f6-4070-9774-5c6f0fbb9866': Document(page_content='bar', metadata={})}"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -503,12 +417,169 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "65654d80",
|
||||
"id": "f4294b96",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"## Similarity Search with filtering\n",
|
||||
"FAISS vectorstore can also support filtering, since the FAISS does not natively support filtering we have to do it manually. This is done by first fetching more results than `k` and then filtering them. This filter is either a callble that takes as input a metadata dict and returns a bool, or a metadata dict where each missing key is ignored and each present k must be in a list of values. You can also set the `fetch_k` parameter when calling any search method to set how many documents you want to fetch before filtering. Here is a small example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "d5bf812c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Content: foo, Metadata: {'page': 1}, Score: 5.159960813797904e-15\n",
|
||||
"Content: foo, Metadata: {'page': 2}, Score: 5.159960813797904e-15\n",
|
||||
"Content: foo, Metadata: {'page': 3}, Score: 5.159960813797904e-15\n",
|
||||
"Content: foo, Metadata: {'page': 4}, Score: 5.159960813797904e-15\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `FAISS` vector store features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.faiss.FAISS.html"
|
||||
"list_of_documents = [\n",
|
||||
" Document(page_content=\"foo\", metadata=dict(page=1)),\n",
|
||||
" Document(page_content=\"bar\", metadata=dict(page=1)),\n",
|
||||
" Document(page_content=\"foo\", metadata=dict(page=2)),\n",
|
||||
" Document(page_content=\"barbar\", metadata=dict(page=2)),\n",
|
||||
" Document(page_content=\"foo\", metadata=dict(page=3)),\n",
|
||||
" Document(page_content=\"bar burr\", metadata=dict(page=3)),\n",
|
||||
" Document(page_content=\"foo\", metadata=dict(page=4)),\n",
|
||||
" Document(page_content=\"bar bruh\", metadata=dict(page=4)),\n",
|
||||
"]\n",
|
||||
"db = FAISS.from_documents(list_of_documents, embeddings)\n",
|
||||
"results_with_scores = db.similarity_search_with_score(\"foo\")\n",
|
||||
"for doc, score in results_with_scores:\n",
|
||||
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3d33c126",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we make the same query call but we filter for only `page = 1` "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "83159330",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Content: foo, Metadata: {'page': 1}, Score: 5.159960813797904e-15\n",
|
||||
"Content: bar, Metadata: {'page': 1}, Score: 0.3131446838378906\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results_with_scores = db.similarity_search_with_score(\"foo\", filter=dict(page=1))\n",
|
||||
"# Or with a callable:\n",
|
||||
"# results_with_scores = db.similarity_search_with_score(\"foo\", filter=lambda d: d[\"page\"] == 1)\n",
|
||||
"for doc, score in results_with_scores:\n",
|
||||
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0be136e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Same thing can be done with the `max_marginal_relevance_search` as well."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "432c6980",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Content: foo, Metadata: {'page': 1}\n",
|
||||
"Content: bar, Metadata: {'page': 1}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = db.max_marginal_relevance_search(\"foo\", filter=dict(page=1))\n",
|
||||
"for doc in results:\n",
|
||||
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1b4ecd86",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here is an example of how to set `fetch_k` parameter when calling `similarity_search`. Usually you would want the `fetch_k` parameter >> `k` parameter. This is because the `fetch_k` parameter is the number of documents that will be fetched before filtering. If you set `fetch_k` to a low number, you might not get enough documents to filter from."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "1fd60fd1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Content: foo, Metadata: {'page': 1}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = db.similarity_search(\"foo\", filter=dict(page=1), k=1, fetch_k=4)\n",
|
||||
"for doc in results:\n",
|
||||
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1becca53",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Delete\n",
|
||||
"\n",
|
||||
"You can also delete records from vectorstore. In the example below `db.index_to_docstore_id` represents a dictionary with elements of the FAISS index."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "1408b870",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"count before: 8\n",
|
||||
"count after: 7"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"count before:\", db.index.ntotal)\n",
|
||||
"db.delete([db.index_to_docstore_id[0]])\n",
|
||||
"print(\"count after:\", db.index.ntotal)"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -528,7 +599,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.11.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
BIN
docs/docs/integrations/vectorstores/faiss_index/index.faiss
Normal file
BIN
docs/docs/integrations/vectorstores/faiss_index/index.faiss
Normal file
Binary file not shown.
@@ -1,29 +0,0 @@
|
||||
---
|
||||
sidebar_position: 0
|
||||
sidebar_class_name: hidden
|
||||
keywords: [compatibility]
|
||||
custom_edit_url:
|
||||
---
|
||||
|
||||
# Vectorstores
|
||||
|
||||
## Features
|
||||
|
||||
The table below lists the features for some of our most popular vector stores.
|
||||
|
||||
Vectorstore|Delete by ID|Filtering|Search by Vector|Search with score|Async|Passes Standard Tests|Multi Tenancy|Local/Cloud|IDs in add Documents
|
||||
:-|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:
|
||||
AstraDBVectorStore|✅|✅|✅|✅|✅|❌|❌|❌|Local|✅
|
||||
Chroma|✅|✅|✅|✅|✅|❌|❌|❌|Local|✅
|
||||
Clickhouse|✅|✅|❌|✅|❌|❌|❌|❌|Local|✅
|
||||
CouchbaseVectorStore|✅|✅|❌|✅|✅|❌|❌|❌|Local|✅
|
||||
ElasticsearchStore|✅|✅|✅|❌|✅|❌|❌|❌|Local|✅
|
||||
FAISS|✅|✅|✅|✅|✅|❌|❌|❌|Local|✅
|
||||
InMemoryVectorStore|✅|✅|❌|✅|✅|❌|❌|❌|Local|✅
|
||||
Milvus|✅|✅|❌|✅|✅|❌|❌|❌|Local|✅
|
||||
MongoDBAtlasVectorSearch|✅|✅|❌|❌|✅|❌|❌|❌|Local|✅
|
||||
PGVector|✅|✅|✅|✅|✅|❌|❌|❌|Local|✅
|
||||
PineconeVectorStore|✅|✅|✅|❌|✅|❌|❌|❌|Local|✅
|
||||
QdrantVectorStore|✅|✅|✅|✅|✅|❌|❌|❌|Local|✅
|
||||
Redis|✅|✅|✅|✅|✅|❌|❌|❌|Local|✅
|
||||
|
||||
@@ -11,9 +11,7 @@
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the Milvus vector database.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"You'll need to install `langchain-milvus` with `pip install -qU langchain-milvus` to use this integration.\n"
|
||||
"You'll need to install `langchain-milvus` with `pip install -qU langchain-milvus` to use this integration\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -25,7 +23,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain_milvus"
|
||||
"%pip install --upgrade --quiet langchain_milvus"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -33,59 +31,119 @@
|
||||
"id": "633addc3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The latest version of pymilvus comes with a local vector database Milvus Lite, good for prototyping. If you have large scale of data such as more than a million docs, we recommend setting up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/install_standalone-docker.md#Start-Milvus).\n",
|
||||
"\n",
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"No credentials are needed to use the `Milvus` vector store.\n",
|
||||
"\n",
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
"The latest version of pymilvus comes with a local vector database Milvus Lite, good for prototyping. If you have large scale of data such as more than a million docs, we recommend setting up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/install_standalone-docker.md#Start-Milvus)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7a0f9e02-8eb0-4aef-b11f-8861360472ee",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "a7dd253f",
|
||||
"metadata": {},
|
||||
"execution_count": 1,
|
||||
"id": "8b6ed9cd-81b9-46e5-9c20-5aafca2844d0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"execution_count": 2,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_milvus.vectorstores import Milvus\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader(\"../../how_to/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "dcf88bdf",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_milvus import Milvus\n",
|
||||
"\n",
|
||||
"# The easiest way is to use Milvus Lite where everything is stored in a local file.\n",
|
||||
"# If you have a Milvus server you can use the server URI such as \"http://localhost:19530\".\n",
|
||||
"URI = \"./milvus_example.db\"\n",
|
||||
"URI = \"./milvus_demo.db\"\n",
|
||||
"\n",
|
||||
"vector_store = Milvus(\n",
|
||||
" embedding_function=embeddings,\n",
|
||||
"vector_db = Milvus.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" connection_args={\"uri\": URI},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a8c513ab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = vector_db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "fc516993",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].page_content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cae1a7d5",
|
||||
"id": "e40d558b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Compartmentalize the data with Milvus Collections\n",
|
||||
@@ -95,7 +153,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c07cd24b",
|
||||
"id": "82c00f6e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here's how you can create a new collection"
|
||||
@@ -103,24 +161,22 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"id": "c6f4973d",
|
||||
"execution_count": null,
|
||||
"id": "f7ff38ab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"vector_store_saved = Milvus.from_documents(\n",
|
||||
" [Document(page_content=\"foo!\")],\n",
|
||||
"vector_db = Milvus.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" collection_name=\"langchain_example\",\n",
|
||||
" collection_name=\"collection_1\",\n",
|
||||
" connection_args={\"uri\": URI},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b12df8c",
|
||||
"id": "891cec1f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And here is how you retrieve that stored collection"
|
||||
@@ -128,278 +184,24 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"id": "12817d16",
|
||||
"execution_count": null,
|
||||
"id": "e9e873e9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store_loaded = Milvus(\n",
|
||||
"vector_db = Milvus(\n",
|
||||
" embeddings,\n",
|
||||
" connection_args={\"uri\": URI},\n",
|
||||
" collection_name=\"langchain_example\",\n",
|
||||
" collection_name=\"collection_1\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f1fc3818",
|
||||
"id": "9cc65535",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"\n",
|
||||
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"We can add items to our vector store by using the `add_documents` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"id": "3ced24f6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['b0248595-2a41-4f6b-9c25-3a24c1278bb3',\n",
|
||||
" 'fa642726-5329-4495-a072-187e948dd71f',\n",
|
||||
" '9905001c-a4a3-455e-ab94-72d0ed11b476',\n",
|
||||
" 'eacc7256-d7fa-4036-b1f7-83d7a4bee0c5',\n",
|
||||
" '7508f7ff-c0c9-49ea-8189-634f8a0244d8',\n",
|
||||
" '2e179609-3ff7-4c6a-9e05-08978903fe26',\n",
|
||||
" 'fab1f2ac-43e1-45f9-b81b-fc5d334c6508',\n",
|
||||
" '1206d237-ee3a-484f-baf2-b5ac38eeb314',\n",
|
||||
" 'd43cbf9a-a772-4c40-993b-9439065fec01',\n",
|
||||
" '25e667bb-6f09-4574-a368-661069301906']"
|
||||
]
|
||||
},
|
||||
"execution_count": 31,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e23c22d8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"id": "1f387fa8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(insert count: 0, delete count: 1, upsert count: 0, timestamp: 0, success count: 0, err count: 0, cost: 0)"
|
||||
]
|
||||
},
|
||||
"execution_count": 32,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vector_store.delete(ids=[uuids[-1]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fb12fa75",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"#### Similarity search\n",
|
||||
"\n",
|
||||
"Performing a simple similarity search with filtering on metadata can be done as follows:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"id": "35801a55",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* Building an exciting new project with LangChain - come check it out! [{'pk': '9905001c-a4a3-455e-ab94-72d0ed11b476', 'source': 'tweet'}]\n",
|
||||
"* LangGraph is the best framework for building stateful, agentic applications! [{'pk': '1206d237-ee3a-484f-baf2-b5ac38eeb314', 'source': 'tweet'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
|
||||
" k=2,\n",
|
||||
" filter={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35574409",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Similarity search with score\n",
|
||||
"\n",
|
||||
"You can also search with score:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "c360af3d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* [SIM=21192.628906] bar [{'pk': '2', 'source': 'https://example.com'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\n",
|
||||
" \"Will it be hot tomorrow?\", k=1, filter={\"source\": \"news\"}\n",
|
||||
")\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "14db337f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For a full list of all the search options available when using the `Milvus` vector store, you can visit the [API reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_milvus.vectorstores.milvus.Milvus.html).\n",
|
||||
"\n",
|
||||
"### Query by turning into retriever\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 34,
|
||||
"id": "f6d9357c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'pk': 'eacc7256-d7fa-4036-b1f7-83d7a4bee0c5', 'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 34,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 1})\n",
|
||||
"retriever.invoke(\"Stealing from the bank is a crime\", filter={\"source\": \"news\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8ac953f1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
"After retrieval you can go on querying it as usual."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -523,12 +325,47 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f1a873c5",
|
||||
"id": "89756e9e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"### To delete or upsert (update/insert) one or more entities"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "21c4edcf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"For detailed documentation of all __ModuleName__VectorStore features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_milvus.vectorstores.milvus.Milvus.html"
|
||||
"# Insert data sample\n",
|
||||
"docs = [\n",
|
||||
" Document(page_content=\"foo\", metadata={\"id\": 1}),\n",
|
||||
" Document(page_content=\"bar\", metadata={\"id\": 2}),\n",
|
||||
" Document(page_content=\"baz\", metadata={\"id\": 3}),\n",
|
||||
"]\n",
|
||||
"vector_db = Milvus.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" connection_args={\"uri\": URI},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Search pks (primary keys) using expression\n",
|
||||
"expr = \"id in [1,2]\"\n",
|
||||
"pks = vector_db.get_pks(expr)\n",
|
||||
"\n",
|
||||
"# Delete entities by pks\n",
|
||||
"result = vector_db.delete(pks)\n",
|
||||
"\n",
|
||||
"# Upsert (Update/Insert)\n",
|
||||
"new_docs = [\n",
|
||||
" Document(page_content=\"new_foo\", metadata={\"id\": 1}),\n",
|
||||
" Document(page_content=\"new_bar\", metadata={\"id\": 2}),\n",
|
||||
" Document(page_content=\"upserted_bak\", metadata={\"id\": 3}),\n",
|
||||
"]\n",
|
||||
"upserted_pks = vector_db.upsert(pks, new_docs)"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -548,7 +385,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.9.18"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -19,40 +19,74 @@
|
||||
"id": "359b8e9b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
">*An Atlas cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs).\n",
|
||||
"\n",
|
||||
"To use MongoDB Atlas, you must first deploy a cluster. We have a Forever-Free tier of clusters available. To get started head over to Atlas here: [quick start](https://www.mongodb.com/docs/atlas/getting-started/).\n",
|
||||
">*An OpenAI API Key. You must have a paid OpenAI account with credits available for API requests.\n",
|
||||
"\n",
|
||||
"You'll need to install `langchain-mongodb` and `pymongo` to use this integration."
|
||||
"You'll need to install `langchain-mongodb` to use this integration"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d899e588",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setting up MongoDB Atlas Cluster\n",
|
||||
"To use MongoDB Atlas, you must first deploy a cluster. We have a Forever-Free tier of clusters available. To get started head over to Atlas here: [quick start](https://www.mongodb.com/docs/atlas/getting-started/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1b5ce18d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage\n",
|
||||
"In the notebook we will demonstrate how to perform `Retrieval Augmented Generation` (RAG) using MongoDB Atlas, OpenAI and Langchain. We will be performing Similarity Search, Similarity Search with Metadata Pre-Filtering, and Question Answering over the PDF document for [GPT 4 technical report](https://arxiv.org/pdf/2303.08774.pdf) that came out in March 2023 and hence is not part of the OpenAI's Large Language Model(LLM)'s parametric memory, which had a knowledge cutoff of September 2021."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "457ace44-1d95-4001-9dd5-78811ab208ad",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use `OpenAIEmbeddings` so we need to set up our OpenAI API Key. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "73cf7c9f",
|
||||
"id": "2d8f240d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install -qU langchain-mongodb pymongo"
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a61832ea",
|
||||
"id": "70482cd8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"For this notebook you will need to find your MongoDB cluster URI.\n",
|
||||
"\n",
|
||||
"For information on finding your cluster URI read through [this guide](https://www.mongodb.com/docs/manual/reference/connection-string/)."
|
||||
"Now we will setup the environment variables for the MongoDB Atlas cluster"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"execution_count": null,
|
||||
"id": "4d7788cf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchain langchain-mongodb pypdf pymongo langchain-openai tiktoken"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7ef41b37",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -62,78 +96,76 @@
|
||||
"MONGODB_ATLAS_CLUSTER_URI = getpass.getpass(\"MongoDB Atlas Cluster URI:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f23de23",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "908e7772",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a53673ae",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 54,
|
||||
"id": "f5fed614",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 56,
|
||||
"id": "00d78318",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch\n",
|
||||
"from pymongo import MongoClient\n",
|
||||
"\n",
|
||||
"# initialize MongoDB python client\n",
|
||||
"client = MongoClient(MONGODB_ATLAS_CLUSTER_URI)\n",
|
||||
"\n",
|
||||
"DB_NAME = \"langchain_test_db\"\n",
|
||||
"COLLECTION_NAME = \"langchain_test_vectorstores\"\n",
|
||||
"ATLAS_VECTOR_SEARCH_INDEX_NAME = \"langchain-test-index-vectorstores\"\n",
|
||||
"DB_NAME = \"langchain_db\"\n",
|
||||
"COLLECTION_NAME = \"test\"\n",
|
||||
"ATLAS_VECTOR_SEARCH_INDEX_NAME = \"index_name\"\n",
|
||||
"\n",
|
||||
"MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]\n",
|
||||
"MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eb0cc10f-b84e-4e5e-b445-eb61f10bf085",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create Vector Search Index"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3ecc42",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's create a vector search index on your cluster. More detailed steps can be found at [Create Vector Search Index for LangChain](https://www.mongodb.com/docs/atlas/atlas-vector-search/ai-integrations/langchain/#create-the-atlas-vector-search-index) section.\n",
|
||||
"In the below example, `embedding` is the name of the field that contains the embedding vector. Please refer to the [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/) to get more details on how to define an Atlas Vector Search index.\n",
|
||||
"You can name the index `{ATLAS_VECTOR_SEARCH_INDEX_NAME}` and create the index on the namespace `{DB_NAME}.{COLLECTION_NAME}`. Finally, write the following definition in the JSON editor on MongoDB Atlas:\n",
|
||||
"\n",
|
||||
"vector_store = MongoDBAtlasVectorSearch(\n",
|
||||
" collection=MONGODB_COLLECTION,\n",
|
||||
" embedding=embeddings,\n",
|
||||
" index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,\n",
|
||||
" relevance_score_fn=\"cosine\",\n",
|
||||
")"
|
||||
"```json\n",
|
||||
"{\n",
|
||||
" \"fields\":[\n",
|
||||
" {\n",
|
||||
" \"type\": \"vector\",\n",
|
||||
" \"path\": \"embedding\",\n",
|
||||
" \"numDimensions\": 1536,\n",
|
||||
" \"similarity\": \"cosine\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Additionally, if you are running a MongoDB M10 cluster with server version 6.0+, you can leverage the `MongoDBAtlasVectorSearch.create_index`. To add the above index its usage would look like this.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain_community.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch\n",
|
||||
"from pymongo import MongoClient\n",
|
||||
"\n",
|
||||
"mongo_client = MongoClient(\"<YOUR-CONNECTION-STRING>\")\n",
|
||||
"collection = mongo_client[\"<db_name>\"][\"<collection_name>\"]\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
"vectorstore = MongoDBAtlasVectorSearch(\n",
|
||||
" collection=collection,\n",
|
||||
" embedding=embeddings,\n",
|
||||
" index_name=\"<ATLAS_VECTOR_SEARCH_INDEX_NAME>\",\n",
|
||||
" relevance_score_fn=\"cosine\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Creates an index using the index_name provided and relevance_score_fn type\n",
|
||||
"vectorstore.create_index(dimensions=1536)\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -141,224 +173,126 @@
|
||||
"id": "42873e5a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"\n",
|
||||
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"We can add items to our vector store by using the `add_documents` function."
|
||||
"# Insert Data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 57,
|
||||
"execution_count": null,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['03ad81e8-32a0-46f0-b7d8-f5b977a6b52a',\n",
|
||||
" '8396a68d-f4a3-4176-a581-a1a8c303eea4',\n",
|
||||
" 'e7d95150-67f6-499f-b611-84367c50fa60',\n",
|
||||
" '8c31b84e-2636-48b6-8b99-9fccb47f7051',\n",
|
||||
" 'aa02e8a2-a811-446a-9785-8cea0faba7a9',\n",
|
||||
" '19bd72ff-9766-4c3b-b1fd-195c732c562b',\n",
|
||||
" '642d6f2f-3e34-4efa-a1ed-c4ba4ef0da8d',\n",
|
||||
" '7614bb54-4eb5-4b3b-990c-00e35cb31f99',\n",
|
||||
" '69e18c67-bf1b-43e5-8a6e-64fb3f240e52',\n",
|
||||
" '30d599a7-4a1a-47a9-bbf8-6ed393e2e33c']"
|
||||
]
|
||||
},
|
||||
"execution_count": 57,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"from langchain_community.document_loaders import PyPDFLoader\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "639f29da",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store\n"
|
||||
"# Load the PDF\n",
|
||||
"loader = PyPDFLoader(\"https://arxiv.org/pdf/2303.08774.pdf\")\n",
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 58,
|
||||
"id": "bbb5fd5c",
|
||||
"execution_count": null,
|
||||
"id": "a5578113",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 58,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store.delete(ids=[uuids[-1]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d6111eb6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"#### Similarity search\n",
|
||||
"\n",
|
||||
"Performing a simple similarity search can be done as follows:"
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)\n",
|
||||
"docs = text_splitter.split_documents(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 62,
|
||||
"id": "19b60ac0",
|
||||
"execution_count": null,
|
||||
"id": "d378168f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* Building an exciting new project with LangChain - come check it out! [{'_id': 'e7d95150-67f6-499f-b611-84367c50fa60', 'source': 'tweet'}]\n",
|
||||
"* LangGraph is the best framework for building stateful, agentic applications! [{'_id': '7614bb54-4eb5-4b3b-990c-00e35cb31f99', 'source': 'tweet'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\", k=2\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6c624606",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Similarity search with score\n",
|
||||
"\n",
|
||||
"You can also search with score:"
|
||||
"print(docs[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 63,
|
||||
"id": "e919fa51",
|
||||
"execution_count": null,
|
||||
"id": "6e104aee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* [SIM=0.784560] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'_id': '8396a68d-f4a3-4176-a581-a1a8c303eea4', 'source': 'news'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\"Will it be hot tomorrow?\", k=1)\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
"from langchain_community.vectorstores import MongoDBAtlasVectorSearch\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"# insert the documents in MongoDB Atlas with their embedding\n",
|
||||
"vector_search = MongoDBAtlasVectorSearch.from_documents(\n",
|
||||
" documents=docs,\n",
|
||||
" embedding=OpenAIEmbeddings(disallowed_special=()),\n",
|
||||
" collection=MONGODB_COLLECTION,\n",
|
||||
" index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7bf6841e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Perform a similarity search between the embedding of the query and the embeddings of the documents\n",
|
||||
"query = \"What were the compute requirements for training GPT 4\"\n",
|
||||
"results = vector_search.similarity_search(query)\n",
|
||||
"\n",
|
||||
"print(results[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "513a1416",
|
||||
"id": "9e58c2d8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Pre-filtering with Similarity Search"
|
||||
"# Querying data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ac58c6c7",
|
||||
"id": "851a2ec9-9390-49a4-8412-3e132c9f789d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also instantiate the vector store directly and execute a query as follows:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "985d28fe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.vectorstores import MongoDBAtlasVectorSearch\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"vector_search = MongoDBAtlasVectorSearch.from_connection_string(\n",
|
||||
" MONGODB_ATLAS_CLUSTER_URI,\n",
|
||||
" DB_NAME + \".\" + COLLECTION_NAME,\n",
|
||||
" OpenAIEmbeddings(disallowed_special=()),\n",
|
||||
" index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02aef29c-5da0-41b8-b4fc-98fd71b94abf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pre-filtering with Similarity Search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f3b2d36d-d47a-482f-999d-85c23eb67eed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Atlas Vector Search supports pre-filtering using MQL Operators for filtering. Below is an example index and query on the same data loaded above that allows you do metadata filtering on the \"page\" field. You can update your existing index with the filter defined and do pre-filtering with vector search."
|
||||
@@ -366,7 +300,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dacac7b8",
|
||||
"id": "2b385a46-1e54-471f-95b2-202813d90bb2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```json\n",
|
||||
@@ -380,7 +314,7 @@
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"type\": \"filter\",\n",
|
||||
" \"path\": \"source\"\n",
|
||||
" \"path\": \"page\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
@@ -391,79 +325,128 @@
|
||||
"```python\n",
|
||||
"vectorstore.create_index(\n",
|
||||
" dimensions=1536,\n",
|
||||
" filters=[{\"type\":\"filter\", \"path\":\"source\"}],\n",
|
||||
" filters=[{\"type\":\"filter\", \"path\":\"page\"}],\n",
|
||||
" update=True\n",
|
||||
")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"And then you can run a query with filter as follows:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"results = vector_store.similarity_search(query=\"foo\",k=1,pre_filter={\"source\": {\"$eq\": \"https://example.com\"}})\n",
|
||||
"for doc in results:\n",
|
||||
" print(f\"* {doc.page_content} [{doc.metadata}]\")\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "32b13a9b",
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dfc8487d-14ec-42c9-9670-80fe02816196",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#### Other search methods\n",
|
||||
"query = \"What were the compute requirements for training GPT 4\"\n",
|
||||
"\n",
|
||||
"There are a variety of other search methods that are not covered in this notebook, such as MMR search or searching by vector. For a full list of the search abilities available for `AstraDBVectorStore` check out the [API reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html)."
|
||||
"results = vector_search.similarity_search_with_score(\n",
|
||||
" query=query, k=5, pre_filter={\"page\": {\"$eq\": 1}}\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Display results\n",
|
||||
"for result in results:\n",
|
||||
" print(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "01316a42",
|
||||
"id": "6d9a2dbe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by turning into retriever\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains. \n",
|
||||
"\n",
|
||||
"Here is how to transform your vector store into a retriever and then invoke the retreiever with a simple query and filter."
|
||||
"## Similarity Search with Score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 65,
|
||||
"id": "8f246301",
|
||||
"execution_count": null,
|
||||
"id": "497baffa",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'_id': '8c31b84e-2636-48b6-8b99-9fccb47f7051', 'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 65,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(\n",
|
||||
" search_type=\"similarity_score_threshold\",\n",
|
||||
" search_kwargs={\"k\": 1, \"score_threshold\": 0.2},\n",
|
||||
"query = \"What were the compute requirements for training GPT 4\"\n",
|
||||
"\n",
|
||||
"results = vector_search.similarity_search_with_score(\n",
|
||||
" query=query,\n",
|
||||
" k=5,\n",
|
||||
")\n",
|
||||
"retriever.invoke(\"Stealing from the bank is a crime\")"
|
||||
"\n",
|
||||
"# Display results\n",
|
||||
"for result in results:\n",
|
||||
" print(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "72312657",
|
||||
"id": "cbade5f0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"## Question Answering "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bc6475f9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa_retriever = vector_search.as_retriever(\n",
|
||||
" search_type=\"similarity\",\n",
|
||||
" search_kwargs={\"k\": 25},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8e13e96c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"prompt_template = \"\"\"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
"{context}\n",
|
||||
"\n",
|
||||
"Question: {question}\n",
|
||||
"\"\"\"\n",
|
||||
"PROMPT = PromptTemplate(\n",
|
||||
" template=prompt_template, input_variables=[\"context\", \"question\"]\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ff0edb02",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
"qa = RetrievalQA.from_chain_type(\n",
|
||||
" llm=OpenAI(),\n",
|
||||
" chain_type=\"stuff\",\n",
|
||||
" retriever=qa_retriever,\n",
|
||||
" return_source_documents=True,\n",
|
||||
" chain_type_kwargs={\"prompt\": PROMPT},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"docs = qa({\"query\": \"gpt-4 compute requirements\"})\n",
|
||||
"\n",
|
||||
"print(docs[\"result\"])\n",
|
||||
"print(docs[\"source_documents\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "61636bb2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"GPT-4 requires significantly more compute than earlier GPT models. On a dataset derived from OpenAI's internal codebase, GPT-4 requires 100p (petaflops) of compute to reach the lowest loss, while the smaller models require 1-10n (nanoflops)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -477,16 +460,6 @@
|
||||
">* The langchain version 0.0.305 ([release notes](https://github.com/langchain-ai/langchain/releases/tag/v0.0.305)) introduces the support for $vectorSearch MQL stage, which is available with MongoDB Atlas 6.0.11 and 7.0.2. Users utilizing earlier versions of MongoDB Atlas need to pin their LangChain version to <=0.0.304\n",
|
||||
"> "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "186ef502",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `MongoDBAtlasVectorSearch` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/mongodb_api_reference.html"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -505,7 +478,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -11,6 +11,12 @@
|
||||
"\n",
|
||||
"The code lives in an integration package called: [langchain_postgres](https://github.com/langchain-ai/langchain-postgres/).\n",
|
||||
"\n",
|
||||
"You can run the following command to spin up a a postgres container with the `pgvector` extension:\n",
|
||||
"\n",
|
||||
"```shell\n",
|
||||
"docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"## Status\n",
|
||||
"\n",
|
||||
"This code has been ported over from `langchain_community` into a dedicated package called `langchain-postgres`. The following changes have been made:\n",
|
||||
@@ -21,39 +27,30 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"Currently, there is **no mechanism** that supports easy data migration on schema changes. So any schema changes in the vectorstore will require the user to recreate the tables and re-add the documents.\n",
|
||||
"If this is a concern, please use a different vectorstore. If not, this implementation should be fine for your use case.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First donwload the partner package:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "92df32f0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install -qU langchain_postgres"
|
||||
"If this is a concern, please use a different vectorstore. If not, this implementation should be fine for your use case."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0dd87fcc",
|
||||
"id": "342cd5e9-f349-42b4-9713-12e63779835b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can run the following command to spin up a a postgres container with the `pgvector` extension:"
|
||||
"## Install dependencies\n",
|
||||
"\n",
|
||||
"Here, we're using `langchain_cohere` for embeddings, but you can use other embeddings providers."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2acbaf9b",
|
||||
"metadata": {},
|
||||
"execution_count": 1,
|
||||
"id": "42d42297-11b8-44e3-bf21-7c3d1bce8277",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16"
|
||||
"!pip install --quiet -U langchain_cohere\n",
|
||||
"!pip install --quiet -U langchain_postgres"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -61,56 +58,7 @@
|
||||
"id": "eee31ce1-2c28-484d-82be-d22d9f9a31fd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"There are no credentials needed to run this notebook, just make sure you downloaded the `langchain_postgres` package and correctly started the postgres container."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fa4026f7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0f8e2f23",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ec44dfcc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "94f5c129",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
"## Initialize the vectorstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -122,6 +70,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_cohere import CohereEmbeddings\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_postgres import PGVector\n",
|
||||
"from langchain_postgres.vectorstores import PGVector\n",
|
||||
@@ -129,9 +78,9 @@
|
||||
"# See docker command above to launch a postgres instance with pgvector enabled.\n",
|
||||
"connection = \"postgresql+psycopg://langchain:langchain@localhost:6024/langchain\" # Uses psycopg3!\n",
|
||||
"collection_name = \"my_docs\"\n",
|
||||
"embeddings = CohereEmbeddings(model=\"embed-english-v3.0\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"vector_store = PGVector(\n",
|
||||
"vectorstore = PGVector(\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" collection_name=collection_name,\n",
|
||||
" connection=connection,\n",
|
||||
@@ -139,37 +88,46 @@
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0fc32168-5a82-4629-a78d-158fe2615086",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Drop tables\n",
|
||||
"\n",
|
||||
"If you need to drop tables (e.g., updating the embedding to a different dimension or just updating the embedding provider): "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5de5ef98-7dbb-4892-853f-47c7dc87b70e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"```python\n",
|
||||
"vectorstore.drop_tables()\n",
|
||||
"````"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "61a224a1-d70b-4daf-86ba-ab6e43c08b50",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"## Add documents\n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"Note that adding documents by ID will over-write any existing documents that match that ID."
|
||||
"Add documents to the vectorstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"id": "88a288cc-ffd4-4800-b011-750c72b9fd10",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = [\n",
|
||||
" Document(\n",
|
||||
@@ -212,29 +170,123 @@
|
||||
" page_content=\"a cooking class for beginners is offered at the community center\",\n",
|
||||
" metadata={\"id\": 10, \"location\": \"community center\", \"topic\": \"classes\"},\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(docs, ids=[doc.metadata[\"id\"] for doc in docs])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0c712fa3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store"
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 6,
|
||||
"id": "73aa9124-9d49-4e10-8ed3-82255e7a4106",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vectorstore.add_documents(docs, ids=[doc.metadata[\"id\"] for doc in docs])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "a5b2b71f-49eb-407d-b03a-dea4c0a517d6",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}),\n",
|
||||
" Document(page_content='the book club meets at the library', metadata={'id': 8, 'topic': 'reading', 'location': 'library'}),\n",
|
||||
" Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),\n",
|
||||
" Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),\n",
|
||||
" Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),\n",
|
||||
" Document(page_content='the market also sells fresh oranges', metadata={'id': 4, 'topic': 'food', 'location': 'market'}),\n",
|
||||
" Document(page_content='a cooking class for beginners is offered at the community center', metadata={'id': 10, 'topic': 'classes', 'location': 'community center'}),\n",
|
||||
" Document(page_content='fresh apples are available at the market', metadata={'id': 3, 'topic': 'food', 'location': 'market'}),\n",
|
||||
" Document(page_content='a sculpture exhibit is also at the museum', metadata={'id': 6, 'topic': 'art', 'location': 'museum'}),\n",
|
||||
" Document(page_content='a new coffee shop opened on Main Street', metadata={'id': 7, 'topic': 'food', 'location': 'Main Street'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vectorstore.similarity_search(\"kitty\", k=10)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1d87a413-015a-4b46-a64e-332f30806524",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Adding documents by ID will over-write any existing documents that match that ID."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "13c69357-aaee-4de0-bcc2-7ab4419c920e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store.delete(ids=[\"3\"])"
|
||||
"docs = [\n",
|
||||
" Document(\n",
|
||||
" page_content=\"there are cats in the pond\",\n",
|
||||
" metadata={\"id\": 1, \"location\": \"pond\", \"topic\": \"animals\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"ducks are also found in the pond\",\n",
|
||||
" metadata={\"id\": 2, \"location\": \"pond\", \"topic\": \"animals\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"fresh apples are available at the market\",\n",
|
||||
" metadata={\"id\": 3, \"location\": \"market\", \"topic\": \"food\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"the market also sells fresh oranges\",\n",
|
||||
" metadata={\"id\": 4, \"location\": \"market\", \"topic\": \"food\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"the new art exhibit is fascinating\",\n",
|
||||
" metadata={\"id\": 5, \"location\": \"museum\", \"topic\": \"art\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"a sculpture exhibit is also at the museum\",\n",
|
||||
" metadata={\"id\": 6, \"location\": \"museum\", \"topic\": \"art\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"a new coffee shop opened on Main Street\",\n",
|
||||
" metadata={\"id\": 7, \"location\": \"Main Street\", \"topic\": \"food\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"the book club meets at the library\",\n",
|
||||
" metadata={\"id\": 8, \"location\": \"library\", \"topic\": \"reading\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"the library hosts a weekly story time for kids\",\n",
|
||||
" metadata={\"id\": 9, \"location\": \"library\", \"topic\": \"reading\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"a cooking class for beginners is offered at the community center\",\n",
|
||||
" metadata={\"id\": 10, \"location\": \"community center\", \"topic\": \"classes\"},\n",
|
||||
" ),\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -242,11 +294,7 @@
|
||||
"id": "59f82250-7903-4279-8300-062542c83416",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
||||
"\n",
|
||||
"### Filtering Support\n",
|
||||
"## Filtering Support\n",
|
||||
"\n",
|
||||
"The vectorstore supports a set of filters that can be applied against the metadata fields of the documents.\n",
|
||||
"\n",
|
||||
@@ -264,38 +312,33 @@
|
||||
"| \\$like | Text (like) |\n",
|
||||
"| \\$ilike | Text (case-insensitive like) |\n",
|
||||
"| \\$and | Logical (and) |\n",
|
||||
"| \\$or | Logical (or) |\n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"Performing a simple similarity search can be done as follows:"
|
||||
"| \\$or | Logical (or) |"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 9,
|
||||
"id": "f15a2359-6dc3-4099-8214-785f167a9ca4",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]\n",
|
||||
"* the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', 'location': 'library'}]\n",
|
||||
"* ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]\n",
|
||||
"* the new art exhibit is fascinating [{'id': 5, 'topic': 'art', 'location': 'museum'}]\n"
|
||||
]
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}),\n",
|
||||
" Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),\n",
|
||||
" Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),\n",
|
||||
" Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"kitty\", k=10, filter={\"id\": {\"$in\": [1, 5, 2, 9]}}\n",
|
||||
")\n",
|
||||
"for doc in results:\n",
|
||||
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
|
||||
"vectorstore.similarity_search(\"kitty\", k=10, filter={\"id\": {\"$in\": [1, 5, 2, 9]}})"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -308,7 +351,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 10,
|
||||
"id": "88f919e4-e4b0-4b5f-99b3-24c675c26d33",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -317,17 +360,17 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),\n",
|
||||
" Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]"
|
||||
"[Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),\n",
|
||||
" Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vector_store.similarity_search(\n",
|
||||
"vectorstore.similarity_search(\n",
|
||||
" \"ducks\",\n",
|
||||
" k=10,\n",
|
||||
" filter={\"id\": {\"$in\": [1, 5, 2, 9]}, \"location\": {\"$in\": [\"pond\", \"market\"]}},\n",
|
||||
@@ -336,7 +379,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 11,
|
||||
"id": "88f423a4-6575-4fb8-9be2-a3da01106591",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -345,17 +388,17 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),\n",
|
||||
" Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]"
|
||||
"[Document(page_content='ducks are also found in the pond', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}),\n",
|
||||
" Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vector_store.similarity_search(\n",
|
||||
"vectorstore.similarity_search(\n",
|
||||
" \"ducks\",\n",
|
||||
" k=10,\n",
|
||||
" filter={\n",
|
||||
@@ -367,90 +410,34 @@
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2e65adc1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to execute a similarity search and receive the corresponding scores you can run:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "7d92e7b3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* [SIM=0.763449] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(query=\"cats\", k=1)\n",
|
||||
"for doc, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8d40db8c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For a full list of the different searches you can execute on a `PGVector` vector store, please refer to the [API reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_postgres.vectorstores.PGVector.html).\n",
|
||||
"\n",
|
||||
"### Query by turning into retriever\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "7cd1fb75",
|
||||
"metadata": {},
|
||||
"execution_count": 12,
|
||||
"id": "65133340-2acd-4957-849e-029b6b5d60f0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]"
|
||||
"[Document(page_content='the book club meets at the library', metadata={'id': 8, 'topic': 'reading', 'location': 'library'}),\n",
|
||||
" Document(page_content='the new art exhibit is fascinating', metadata={'id': 5, 'topic': 'art', 'location': 'museum'}),\n",
|
||||
" Document(page_content='the library hosts a weekly story time for kids', metadata={'id': 9, 'topic': 'reading', 'location': 'library'}),\n",
|
||||
" Document(page_content='a sculpture exhibit is also at the museum', metadata={'id': 6, 'topic': 'art', 'location': 'museum'}),\n",
|
||||
" Document(page_content='the market also sells fresh oranges', metadata={'id': 4, 'topic': 'food', 'location': 'market'}),\n",
|
||||
" Document(page_content='a cooking class for beginners is offered at the community center', metadata={'id': 10, 'topic': 'classes', 'location': 'community center'}),\n",
|
||||
" Document(page_content='a new coffee shop opened on Main Street', metadata={'id': 7, 'topic': 'food', 'location': 'Main Street'}),\n",
|
||||
" Document(page_content='fresh apples are available at the market', metadata={'id': 3, 'topic': 'food', 'location': 'market'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 1})\n",
|
||||
"retriever.invoke(\"kitty\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7ecd77a0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f451f361",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all __ModuleName__VectorStore features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_postgres.vectorstores.PGVector.html"
|
||||
"vectorstore.similarity_search(\"bird\", k=10, filter={\"location\": {\"$ne\": \"pond\"}})"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -470,7 +457,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -12,9 +12,8 @@
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Pinecone` vector database.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"To use the `PineconeVectorStore` you first need to install the partner package, as well as the other packages used throughout this notebook."
|
||||
"Set the following environment variables to follow along in this doc:\n",
|
||||
"- `OPENAI_API_KEY`: Your OpenAI API key, for using `OpenAIEmbeddings`"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -26,7 +25,12 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-pinecone pinecone-notebooks"
|
||||
"%pip install --upgrade --quiet \\\n",
|
||||
" langchain-pinecone \\\n",
|
||||
" langchain-openai \\\n",
|
||||
" langchain \\\n",
|
||||
" langchain-community \\\n",
|
||||
" pinecone-notebooks"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -39,52 +43,76 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ef6dc4de",
|
||||
"id": "42f2ea67",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"Create a new Pinecone account, or sign into your existing one, and create an API key to use in this notebook."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "eb554814",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"import time\n",
|
||||
"\n",
|
||||
"from pinecone import Pinecone, ServerlessSpec\n",
|
||||
"\n",
|
||||
"if not os.getenv(\"PINECONE_API_KEY\"):\n",
|
||||
" os.environ[\"PINECONE_API_KEY\"] = getpass.getpass(\"Enter your Pinecone API key: \")\n",
|
||||
"\n",
|
||||
"pinecone_api_key = os.environ.get(\"PINECONE_API_KEY\")\n",
|
||||
"\n",
|
||||
"pc = Pinecone(api_key=pinecone_api_key)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6ef1d828",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
"First, let's split our state of the union document into chunked `docs`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "23b5ac5e",
|
||||
"id": "a3c3999a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../how_to/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ef6dc4de",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's create a new Pinecone account, or sign into your existing one, and create an API key to use in this notebook."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1fdc3c36",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from pinecone_notebooks.colab import Authenticate\n",
|
||||
"\n",
|
||||
"Authenticate()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "54da1a39",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The newly created API key has been stored in the `PINECONE_API_KEY` environment variable. We will use it to setup the Pinecone client."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eb554814",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"pinecone_api_key = os.environ.get(\"PINECONE_API_KEY\")\n",
|
||||
"pinecone_api_key\n",
|
||||
"\n",
|
||||
"import time\n",
|
||||
"\n",
|
||||
"from pinecone import Pinecone, ServerlessSpec\n",
|
||||
"\n",
|
||||
"pc = Pinecone(api_key=pinecone_api_key)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -92,28 +120,26 @@
|
||||
"id": "658706a3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"Before initializing our vector store, let's connect to a Pinecone index. If one named `index_name` doesn't exist, it will be created."
|
||||
"Next, let's connect to your Pinecone index. If one named `index_name` doesn't exist, it will be created."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": null,
|
||||
"id": "276a06dd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"\n",
|
||||
"index_name = \"langchain-test-index\" # change if desired\n",
|
||||
"index_name = \"langchain-index\" # change if desired\n",
|
||||
"\n",
|
||||
"existing_indexes = [index_info[\"name\"] for index_info in pc.list_indexes()]\n",
|
||||
"\n",
|
||||
"if index_name not in existing_indexes:\n",
|
||||
" pc.create_index(\n",
|
||||
" name=index_name,\n",
|
||||
" dimension=3072,\n",
|
||||
" dimension=1536,\n",
|
||||
" metric=\"cosine\",\n",
|
||||
" spec=ServerlessSpec(cloud=\"aws\", region=\"us-east-1\"),\n",
|
||||
" )\n",
|
||||
@@ -128,188 +154,24 @@
|
||||
"id": "3a4d377f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that our Pinecone index is setup, we can initialize our vector store. \n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
"Now that our Pinecone index is setup, we can upsert those chunked docs as contents with `PineconeVectorStore.from_documents`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "1485db56",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 6,
|
||||
"id": "6e104aee",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_pinecone import PineconeVectorStore\n",
|
||||
"\n",
|
||||
"vector_store = PineconeVectorStore(index=index, embedding=embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "48721e29",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"\n",
|
||||
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"We can add items to our vector store by using the `add_documents` function."
|
||||
"docsearch = PineconeVectorStore.from_documents(docs, embeddings, index_name=index_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "70e688f4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['167b8681-5974-467f-adcb-6e987a18df01',\n",
|
||||
" 'd16010fd-41f8-4d49-9c22-c66d5555a3fe',\n",
|
||||
" 'ffcacfb3-2bc2-44c3-a039-c2256a905c0e',\n",
|
||||
" 'cf3bfc9f-5dc7-4f5e-bb41-edb957394126',\n",
|
||||
" 'e99b07eb-fdff-4cb9-baa8-619fd8efeed3',\n",
|
||||
" '68c93033-a24f-40bd-8492-92fa26b631a4',\n",
|
||||
" 'b27a4ecb-b505-4c5d-89ff-526e3d103558',\n",
|
||||
" '4868a9e6-e6fb-4079-b400-4a1dfbf0d4c4',\n",
|
||||
" '921c0e9c-0550-4eb5-9a6c-ed44410788b2',\n",
|
||||
" 'c446fc23-64e8-47e7-8c19-ecf985e9411e']"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "120922b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "5b8437cd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store.delete(ids=[uuids[-1]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5ee21c89",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"Performing a simple similarity search can be done as follows:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 7,
|
||||
"id": "ffbcb3fb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -317,114 +179,214 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
|
||||
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
|
||||
" k=2,\n",
|
||||
" filter={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(query)\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "79f3494d",
|
||||
"id": "86a4b96b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Similarity search with score\n",
|
||||
"### Adding More Text to an Existing Index\n",
|
||||
"\n",
|
||||
"You can also search with score:"
|
||||
"More text can embedded and upserted to an existing Pinecone index using the `add_texts` function\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "5fb24583",
|
||||
"execution_count": 8,
|
||||
"id": "38a7a60e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['24631802-4bad-44a7-a4ba-fd71f00cc160']"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings)\n",
|
||||
"\n",
|
||||
"vectorstore.add_texts([\"More text!\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "d46d1452",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Maximal Marginal Relevance Searches\n",
|
||||
"\n",
|
||||
"In addition to using similarity search in the retriever object, you can also use `mmr` as retriever.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "a359ed74",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* [SIM=0.553187] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]\n"
|
||||
"\n",
|
||||
"## Document 0\n",
|
||||
"\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"\n",
|
||||
"## Document 1\n",
|
||||
"\n",
|
||||
"And I’m taking robust action to make sure the pain of our sanctions is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n",
|
||||
"\n",
|
||||
"Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world. \n",
|
||||
"\n",
|
||||
"America will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies. \n",
|
||||
"\n",
|
||||
"These steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming. \n",
|
||||
"\n",
|
||||
"But I want you to know that we are going to be okay. \n",
|
||||
"\n",
|
||||
"When the history of this era is written Putin’s war on Ukraine will have left Russia weaker and the rest of the world stronger. \n",
|
||||
"\n",
|
||||
"While it shouldn’t have taken something so terrible for people around the world to see what’s at stake now everyone sees it clearly.\n",
|
||||
"\n",
|
||||
"## Document 2\n",
|
||||
"\n",
|
||||
"We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n",
|
||||
"\n",
|
||||
"I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n",
|
||||
"\n",
|
||||
"They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n",
|
||||
"\n",
|
||||
"Officer Mora was 27 years old. \n",
|
||||
"\n",
|
||||
"Officer Rivera was 22. \n",
|
||||
"\n",
|
||||
"Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n",
|
||||
"\n",
|
||||
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
|
||||
"\n",
|
||||
"I’ve worked on these issues a long time. \n",
|
||||
"\n",
|
||||
"I know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.\n",
|
||||
"\n",
|
||||
"## Document 3\n",
|
||||
"\n",
|
||||
"One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \n",
|
||||
"\n",
|
||||
"When they came home, many of the world’s fittest and best trained warriors were never the same. \n",
|
||||
"\n",
|
||||
"Headaches. Numbness. Dizziness. \n",
|
||||
"\n",
|
||||
"A cancer that would put them in a flag-draped coffin. \n",
|
||||
"\n",
|
||||
"I know. \n",
|
||||
"\n",
|
||||
"One of those soldiers was my son Major Beau Biden. \n",
|
||||
"\n",
|
||||
"We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \n",
|
||||
"\n",
|
||||
"But I’m committed to finding out everything we can. \n",
|
||||
"\n",
|
||||
"Committed to military families like Danielle Robinson from Ohio. \n",
|
||||
"\n",
|
||||
"The widow of Sergeant First Class Heath Robinson. \n",
|
||||
"\n",
|
||||
"He was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \n",
|
||||
"\n",
|
||||
"Stationed near Baghdad, just yards from burn pits the size of football fields. \n",
|
||||
"\n",
|
||||
"Heath’s widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\n",
|
||||
" \"Will it be hot tomorrow?\", k=1, filter={\"source\": \"news\"}\n",
|
||||
")\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
"retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
|
||||
"matched_docs = retriever.invoke(query)\n",
|
||||
"for i, d in enumerate(matched_docs):\n",
|
||||
" print(f\"\\n## Document {i}\\n\")\n",
|
||||
" print(d.page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1855941b",
|
||||
"id": "7c477287",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Other search methods\n",
|
||||
"\n",
|
||||
"There are more search methods (such as MMR) not listed in this notebook, to find all of them be sure to read the [API reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_pinecone.vectorstores.PineconeVectorStore.html).\n",
|
||||
"\n",
|
||||
"### Query by turning into retriever\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains."
|
||||
"Or use `max_marginal_relevance_search` directly:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "78140e87",
|
||||
"execution_count": 10,
|
||||
"id": "9ca82740",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. \n",
|
||||
"\n",
|
||||
"2. We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n",
|
||||
"\n",
|
||||
"I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n",
|
||||
"\n",
|
||||
"They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n",
|
||||
"\n",
|
||||
"Officer Mora was 27 years old. \n",
|
||||
"\n",
|
||||
"Officer Rivera was 22. \n",
|
||||
"\n",
|
||||
"Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n",
|
||||
"\n",
|
||||
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
|
||||
"\n",
|
||||
"I’ve worked on these issues a long time. \n",
|
||||
"\n",
|
||||
"I know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. \n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(\n",
|
||||
" search_type=\"similarity_score_threshold\",\n",
|
||||
" search_kwargs={\"k\": 1, \"score_threshold\": 0.5},\n",
|
||||
")\n",
|
||||
"retriever.invoke(\"Stealing from the bank is a crime\", filter={\"source\": \"news\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "72990cb5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0d5722bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all __ModuleName__VectorStore features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_pinecone.vectorstores.PineconeVectorStore.html"
|
||||
"found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
|
||||
"for i, doc in enumerate(found_docs):\n",
|
||||
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -444,7 +406,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.11.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -14,9 +14,6 @@
|
||||
"\n",
|
||||
"> This page documents the `QdrantVectorStore` class that supports multiple retrieval modes via Qdrant's new [Query API](https://qdrant.tech/blog/qdrant-1.10.x/). It requires you to run Qdrant v1.10.0 or above.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"There are various modes of how to run `Qdrant`, and depending on the chosen one, there will be some subtle differences. The options include:\n",
|
||||
"- Local mode, no server required\n",
|
||||
"- Docker deployments\n",
|
||||
@@ -34,30 +31,56 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-qdrant"
|
||||
"%pip install langchain-qdrant langchain-openai langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7d387fea",
|
||||
"id": "7b2f111b-357a-4f42-9730-ef0603bdc1b5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"There are no credentials needed to run the code in this notebook.\n",
|
||||
"\n",
|
||||
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
"We will use `OpenAIEmbeddings` for demonstration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4912937d",
|
||||
"metadata": {},
|
||||
"execution_count": 3,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:22.282884Z",
|
||||
"start_time": "2023-04-04T10:51:21.408077Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain_qdrant import QdrantVectorStore\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:22.520144Z",
|
||||
"start_time": "2023-04-04T10:51:22.285826Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader(\"some-file.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -66,7 +89,7 @@
|
||||
"id": "eeead681",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"## Connecting to Qdrant from LangChain\n",
|
||||
"\n",
|
||||
"### Local mode\n",
|
||||
"\n",
|
||||
@@ -74,33 +97,12 @@
|
||||
"\n",
|
||||
"#### In-memory\n",
|
||||
"\n",
|
||||
"For some testing scenarios and quick experiments, you may prefer to keep all the data in memory only, so it gets lost when the client is destroyed - usually at the end of your script/notebook.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n",
|
||||
"```"
|
||||
"For some testing scenarios and quick experiments, you may prefer to keep all the data in memory only, so it gets lost when the client is destroyed - usually at the end of your script/notebook."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "1df86797",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"id": "8429667e",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -111,21 +113,11 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_qdrant import QdrantVectorStore\n",
|
||||
"from qdrant_client import QdrantClient\n",
|
||||
"from qdrant_client.http.models import Distance, VectorParams\n",
|
||||
"\n",
|
||||
"client = QdrantClient(\":memory:\")\n",
|
||||
"\n",
|
||||
"client.create_collection(\n",
|
||||
" collection_name=\"demo_collection\",\n",
|
||||
" vectors_config=VectorParams(size=3072, distance=Distance.COSINE),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"vector_store = QdrantVectorStore(\n",
|
||||
" client=client,\n",
|
||||
" collection_name=\"demo_collection\",\n",
|
||||
" embedding=embeddings,\n",
|
||||
"qdrant = QdrantVectorStore.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" location=\":memory:\", # Local mode with in-memory storage only\n",
|
||||
" collection_name=\"my_documents\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -142,7 +134,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"id": "24b370e2",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -153,17 +145,11 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"client = QdrantClient(path=\"/tmp/langchain_qdrant\")\n",
|
||||
"\n",
|
||||
"client.create_collection(\n",
|
||||
" collection_name=\"demo_collection\",\n",
|
||||
" vectors_config=VectorParams(size=3072, distance=Distance.COSINE),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"vector_store = QdrantVectorStore(\n",
|
||||
" client=client,\n",
|
||||
" collection_name=\"demo_collection\",\n",
|
||||
" embedding=embeddings,\n",
|
||||
"qdrant = QdrantVectorStore.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" path=\"/tmp/local_qdrant\",\n",
|
||||
" collection_name=\"my_documents\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -191,7 +177,6 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = \"<---qdrant url here --->\"\n",
|
||||
"docs = [] # put docs here\n",
|
||||
"qdrant = QdrantVectorStore.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
@@ -260,151 +245,44 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qdrant = QdrantVectorStore.from_existing_collection(\n",
|
||||
" embedding=embeddings,\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" collection_name=\"my_documents\",\n",
|
||||
" url=\"http://localhost:6333\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "3cddef6e",
|
||||
"id": "93540013",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"## Recreating the collection\n",
|
||||
"\n",
|
||||
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"We can add items to our vector store by using the `add_documents` function."
|
||||
"The collection is reused if it already exists. Setting `force_recreate` to `True` allows to remove the old collection and start from scratch."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "7697a362",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['c04134c3-273d-4766-949a-eee46052ad32',\n",
|
||||
" '9e6ba50c-794f-4b88-94e5-411f15052a02',\n",
|
||||
" 'd3202666-6f2b-4186-ac43-e35389de8166',\n",
|
||||
" '50d8d6ee-69bf-4173-a6a2-b254e9928965',\n",
|
||||
" 'bd2eae02-74b5-43ec-9fcf-09e9d9db6fd3',\n",
|
||||
" '6dae6b37-826d-4f14-8376-da4603b35de3',\n",
|
||||
" 'b0964ab5-5a14-47b4-a983-37fa5c5bd154',\n",
|
||||
" '91ed6c56-fe53-49e2-8199-c3bb3c33c3eb',\n",
|
||||
" '42a580cb-7469-4324-9927-0febab57ce92',\n",
|
||||
" 'ff774e5c-f158-4d12-94e2-0a0162b22f27']"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
"id": "30a87570",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:24.854117Z",
|
||||
"start_time": "2023-04-04T10:51:24.845385Z"
|
||||
}
|
||||
],
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5fd23102",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 37,
|
||||
"id": "999cafcc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vector_store.delete(ids=[uuids[-1]])"
|
||||
"url = \"<---qdrant url here --->\"\n",
|
||||
"qdrant = QdrantVectorStore.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" url=url,\n",
|
||||
" prefer_grpc=True,\n",
|
||||
" collection_name=\"my_documents\",\n",
|
||||
" force_recreate=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -418,55 +296,22 @@
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"## Similarity search\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
||||
"The simplest scenario for using Qdrant vector store is to perform a similarity search. Under the hood, our query will be encoded into vector embeddings and used to find similar documents in Qdrant collection.\n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"The simplest scenario for using Qdrant vector store is to perform a similarity search. Under the hood, our query will be encoded into vector embeddings and used to find similar documents in Qdrant collection."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "a8c513ab",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:25.204469Z",
|
||||
"start_time": "2023-04-04T10:51:24.855618Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet', '_id': 'd3202666-6f2b-4186-ac43-e35389de8166', '_collection_name': 'demo_collection'}]\n",
|
||||
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet', '_id': '91ed6c56-fe53-49e2-8199-c3bb3c33c3eb', '_collection_name': 'demo_collection'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\", k=2\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "79bcb0ce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`QdrantVectorStore` supports 3 modes for similarity searches. They can be configured using the `retrieval_mode` parameter when setting up the class.\n",
|
||||
"\n",
|
||||
"- Dense Vector Search(Default)\n",
|
||||
"- Sparse Vector Search\n",
|
||||
"- Hybrid Search\n",
|
||||
"\n",
|
||||
"- Hybrid Search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b3a78d46",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Dense Vector Search\n",
|
||||
"\n",
|
||||
"To search with only dense vectors,\n",
|
||||
@@ -478,8 +323,14 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5e097299",
|
||||
"metadata": {},
|
||||
"id": "a8c513ab",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:25.204469Z",
|
||||
"start_time": "2023-04-04T10:51:24.855618Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_qdrant import RetrievalMode\n",
|
||||
@@ -516,7 +367,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8435c0f1",
|
||||
"id": "ceb493a3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -526,7 +377,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7cf1e3ef",
|
||||
"id": "052e3412",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -548,7 +399,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "26e20c61",
|
||||
"id": "f4b6c456",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Hybrid Vector Search\n",
|
||||
@@ -565,7 +416,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f37c8519",
|
||||
"id": "ce56f6e9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -592,12 +443,15 @@
|
||||
"id": "1bda9bf5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to execute a similarity search and receive the corresponding scores you can run:"
|
||||
"## Similarity search with score\n",
|
||||
"\n",
|
||||
"Sometimes we might want to perform the search, but also obtain a relevancy score to know how good is a particular result. \n",
|
||||
"The returned distance score is cosine distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 11,
|
||||
"id": "8804a21d",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -605,21 +459,27 @@
|
||||
"start_time": "2023-04-04T10:51:25.227384Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* [SIM=0.531834] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news', '_id': '9e6ba50c-794f-4b88-94e5-411f15052a02', '_collection_name': 'demo_collection'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\n",
|
||||
" query=\"Will it be hot tomorrow\", k=1\n",
|
||||
")\n",
|
||||
"for doc, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]\")"
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = qdrant.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "756a6887",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:25.642282Z",
|
||||
"start_time": "2023-04-04T10:51:25.635947Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"document, score = found_docs[0]\n",
|
||||
"print(document.page_content)\n",
|
||||
"print(f\"\\nScore: {score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -628,46 +488,73 @@
|
||||
"id": "525e3582",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For a full list of all the search functions available for a `QdrantVectorStore`, read the [API reference](https://api.python.langchain.com/en/latest/qdrant/langchain_qdrant.qdrant.QdrantVectorStore.html)\n",
|
||||
"\n",
|
||||
"### Metadata filtering\n",
|
||||
"\n",
|
||||
"Qdrant has an [extensive filtering system](https://qdrant.tech/documentation/concepts/filtering/) with rich type support. It is also possible to use the filters in Langchain, by passing an additional param to both the `similarity_search_with_score` and `similarity_search` methods."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "dc7cffc8",
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1c2c58dc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* The top 10 soccer players in the world right now. [{'source': 'website', '_id': 'b0964ab5-5a14-47b4-a983-37fa5c5bd154', '_collection_name': 'demo_collection'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"```python\n",
|
||||
"from qdrant_client.http import models\n",
|
||||
"\n",
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" query=\"Who are the best soccer players in the world?\",\n",
|
||||
" k=1,\n",
|
||||
" filter=models.Filter(\n",
|
||||
" should=[\n",
|
||||
" models.FieldCondition(\n",
|
||||
" key=\"page_content\",\n",
|
||||
" match=models.MatchValue(\n",
|
||||
" value=\"The top 10 soccer players in the world right now.\"\n",
|
||||
" ),\n",
|
||||
" ),\n",
|
||||
" ]\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
"for doc in results:\n",
|
||||
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = qdrant.similarity_search_with_score(query, filter=models.Filter(...))\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c58c30bf",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:39:53.032744Z",
|
||||
"start_time": "2023-04-04T10:39:53.028673Z"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Maximum marginal relevance search (MMR)\n",
|
||||
"\n",
|
||||
"If you'd like to look up some similar documents, but you'd also like to receive diverse results, MMR is the method you should consider. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.\n",
|
||||
"\n",
|
||||
"Note that MMR search is only available if you've added documents with `DENSE` or `HYBRID` modes. Since it requires dense vectors."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "76810fb6",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:26.010947Z",
|
||||
"start_time": "2023-04-04T10:51:25.647687Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = qdrant.max_marginal_relevance_search(query, k=2, fetch_k=10)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "80c6db11",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:26.016979Z",
|
||||
"start_time": "2023-04-04T10:51:26.013329Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for i, doc in enumerate(found_docs):\n",
|
||||
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -676,14 +563,14 @@
|
||||
"id": "691a82d6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by turning into retriever\n",
|
||||
"## Qdrant as a Retriever\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains. "
|
||||
"Qdrant, as all the other vector stores, is a LangChain Retriever. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": null,
|
||||
"id": "9427195f",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -691,35 +578,49 @@
|
||||
"start_time": "2023-04-04T10:51:26.018763Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'source': 'news', '_id': '50d8d6ee-69bf-4173-a6a2-b254e9928965', '_collection_name': 'demo_collection'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 1})\n",
|
||||
"retriever.invoke(\"Stealing from the bank is a crime\")"
|
||||
"retriever = qdrant.as_retriever()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6ac07288",
|
||||
"id": "0c851b4f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
||||
"It might be also specified to use MMR as a search strategy, instead of similarity."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "64348f1b",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:26.043909Z",
|
||||
"start_time": "2023-04-04T10:51:26.034284Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = qdrant.as_retriever(search_type=\"mmr\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f3c70c31",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:26.495652Z",
|
||||
"start_time": "2023-04-04T10:51:26.046407Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"retriever.invoke(query)[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -746,8 +647,6 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_qdrant import RetrievalMode\n",
|
||||
"\n",
|
||||
"QdrantVectorStore.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embedding=embeddings,\n",
|
||||
@@ -808,14 +707,12 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2300e785",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `QdrantVectorStore` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/qdrant/langchain_qdrant.qdrant.QdrantVectorStore.html"
|
||||
]
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -834,7 +731,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
"version": "3.11.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -43,7 +43,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -87,18 +87,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.'),\n",
|
||||
" Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \\n\\nIn this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. \\n\\nLet each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. \\n\\nPlease rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. \\n\\nThroughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos. \\n\\nThey keep moving. \\n\\nAnd the costs and the threats to America and the world keep rising. \\n\\nThat’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2. \\n\\nThe United States is a member along with 29 other nations. \\n\\nIt matters. American diplomacy matters. American resolve matters.'),\n",
|
||||
" Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='Putin’s latest attack on Ukraine was premeditated and unprovoked. \\n\\nHe rejected repeated efforts at diplomacy. \\n\\nHe thought the West and NATO wouldn’t respond. And he thought he could divide us at home. Putin was wrong. We were ready. Here is what we did. \\n\\nWe prepared extensively and carefully. \\n\\nWe spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. \\n\\nI spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression. \\n\\nWe countered Russia’s lies with truth. \\n\\nAnd now that he has acted the free world is holding him accountable. \\n\\nAlong with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.')]"
|
||||
"[Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', metadata={'source': '../../how_to/state_of_the_union.txt'}),\n",
|
||||
" Document(page_content='Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \\n\\nIn this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. \\n\\nLet each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. \\n\\nPlease rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. \\n\\nThroughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos. \\n\\nThey keep moving. \\n\\nAnd the costs and the threats to America and the world keep rising. \\n\\nThat’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2. \\n\\nThe United States is a member along with 29 other nations. \\n\\nIt matters. American diplomacy matters. American resolve matters.', metadata={'source': '../../how_to/state_of_the_union.txt'}),\n",
|
||||
" Document(page_content='Putin’s latest attack on Ukraine was premeditated and unprovoked. \\n\\nHe rejected repeated efforts at diplomacy. \\n\\nHe thought the West and NATO wouldn’t respond. And he thought he could divide us at home. Putin was wrong. We were ready. Here is what we did. \\n\\nWe prepared extensively and carefully. \\n\\nWe spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. \\n\\nI spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression. \\n\\nWe countered Russia’s lies with truth. \\n\\nAnd now that he has acted the free world is holding him accountable. \\n\\nAlong with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.', metadata={'source': '../../how_to/state_of_the_union.txt'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -126,20 +126,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 35,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['247aa3ae-9be9-43e2-98e4-48f94f920749',\n",
|
||||
" 'c4dfc886-0a2d-497c-b2b7-d923a5cb3832',\n",
|
||||
" '0350761d-ca68-414e-b8db-7eca78cb0d18',\n",
|
||||
" '902fe5eb-8543-486a-bd5f-79858a7a8af1',\n",
|
||||
" '28875612-c672-4de4-b40a-3b658c72036a']"
|
||||
"['82b3781b-817c-4a4d-8f8b-cbd07c1d005a',\n",
|
||||
" 'a20e0a49-29d8-465e-8eae-0bc5ac3d24dc',\n",
|
||||
" 'c19f4108-b652-4890-873e-d4cad00f1b1a',\n",
|
||||
" '23d1fcf9-6ee1-4638-8c70-0f5030762301',\n",
|
||||
" '2d775784-825d-4627-97a3-fee4539d8f58']"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 35,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -154,116 +154,33 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying\n",
|
||||
"\n",
|
||||
"The database can be queried using a vector or a text prompt.\n",
|
||||
"If a text prompt is used, it's first converted into embedding and then queried.\n",
|
||||
"\n",
|
||||
"The `k` parameter specifies how many results to return from the query."
|
||||
"store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 27,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='If you travel 20 miles east of Columbus, Ohio, you’ll find 1,000 empty acres of land. \\n\\nIt won’t look like much, but if you stop and look closely, you’ll see a “Field of dreams,” the ground on which America’s future will be built. \\n\\nThis is where Intel, the American company that helped build Silicon Valley, is going to build its $20 billion semiconductor “mega site”. \\n\\nUp to eight state-of-the-art factories in one place. 10,000 new good-paying jobs. \\n\\nSome of the most sophisticated manufacturing in the world to make computer chips the size of a fingertip that power the world and our everyday lives. \\n\\nSmartphones. The Internet. Technology we have yet to invent. \\n\\nBut that’s just the beginning. \\n\\nIntel’s CEO, Pat Gelsinger, who is here tonight, told me they are ready to increase their investment from \\n$20 billion to $100 billion. \\n\\nThat would be one of the biggest investments in manufacturing in American history. \\n\\nAnd all they’re waiting for is for you to pass this bill.'),\n",
|
||||
" Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='So let’s not wait any longer. Send it to my desk. I’ll sign it. \\n\\nAnd we will really take off. \\n\\nAnd Intel is not alone. \\n\\nThere’s something happening in America. \\n\\nJust look around and you’ll see an amazing story. \\n\\nThe rebirth of the pride that comes from stamping products “Made In America.” The revitalization of American manufacturing. \\n\\nCompanies are choosing to build new factories here, when just a few years ago, they would have built them overseas. \\n\\nThat’s what is happening. Ford is investing $11 billion to build electric vehicles, creating 11,000 jobs across the country. \\n\\nGM is making the largest investment in its history—$7 billion to build electric vehicles, creating 4,000 jobs in Michigan. \\n\\nAll told, we created 369,000 new manufacturing jobs in America just last year. \\n\\nPowered by people I’ve met like JoJo Burgess, from generations of union steelworkers from Pittsburgh, who’s here with us tonight.'),\n",
|
||||
" Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='When we use taxpayer dollars to rebuild America – we are going to Buy American: buy American products to support American jobs. \\n\\nThe federal government spends about $600 Billion a year to keep the country safe and secure. \\n\\nThere’s been a law on the books for almost a century \\nto make sure taxpayers’ dollars support American jobs and businesses. \\n\\nEvery Administration says they’ll do it, but we are actually doing it. \\n\\nWe will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America. \\n\\nBut to compete for the best jobs of the future, we also need to level the playing field with China and other competitors. \\n\\nThat’s why it is so important to pass the Bipartisan Innovation Act sitting in Congress that will make record investments in emerging technologies and American manufacturing. \\n\\nLet me give you one example of why it’s so important to pass it.'),\n",
|
||||
" Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='Last month, I announced our plan to supercharge \\nthe Cancer Moonshot that President Obama asked me to lead six years ago. \\n\\nOur goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases. \\n\\nMore support for patients and families. \\n\\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \\n\\nIt’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more. \\n\\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more. \\n\\nA unity agenda for the nation. \\n\\nWe can do this. \\n\\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \\n\\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \\n\\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror.'),\n",
|
||||
" Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='And based on the projections, more of the country will reach that point across the next couple of weeks. \\n\\nThanks to the progress we have made this past year, COVID-19 need no longer control our lives. \\n\\nI know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. \\n\\nWe will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard. \\n\\nHere are four common sense steps as we move forward safely. \\n\\nFirst, stay protected with vaccines and treatments. We know how incredibly effective vaccines are. If you’re vaccinated and boosted you have the highest degree of protection. \\n\\nWe will never give up on vaccinating more Americans. Now, I know parents with kids under 5 are eager to see a vaccine authorized for their children. \\n\\nThe scientists are working hard to get that done and we’ll be ready with plenty of vaccines when they do.')]"
|
||||
"['fe1f7a7b-42e2-4828-88b0-5b449c49fe86',\n",
|
||||
" '154a0021-a99c-427e-befb-f0b2b18ed83c',\n",
|
||||
" 'a8218226-18a9-4ab5-ade5-5a71b19a7831',\n",
|
||||
" '62b7ef97-83bf-4b6d-8c93-f471796244dc',\n",
|
||||
" 'ab43fd2e-13df-46d4-8cf7-e6e16506e4bb',\n",
|
||||
" '6841e7f9-adaa-41d9-af3d-0813ee52443f',\n",
|
||||
" '45dda5a1-f0c1-4ac7-9acb-50253e4ee493']"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 27,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = store.similarity_search(\"technology\", k=5)\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying with score\n",
|
||||
"\n",
|
||||
"The score of the query can be included for every result. \n",
|
||||
"\n",
|
||||
"> The score returned in the query requests is a normalized value between 0 and 1, where 1 indicates the highest similarity and 0 the lowest regardless of the similarity function used. For more information look at the [docs](https://upstash.com/docs/vector/overall/features#vector-similarity-functions)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.8968438\n",
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.8895128\n",
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.88626665\n",
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.88538057\n",
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.88432854\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = store.similarity_search_with_score(\"technology\", k=5)\n",
|
||||
"\n",
|
||||
"for doc, score in result:\n",
|
||||
" print(f\"{doc.metadata} - {score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Namespaces\n",
|
||||
"\n",
|
||||
"Namespaces can be used to separate different types of documents. This can increase the efficiency of the queries since the search space is reduced. When no namespace is provided, the default namespace is used."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"store_books = UpstashVectorStore(embedding=embeddings, namespace=\"books\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['928a5f12-900f-40b7-9406-3861741cc9d6',\n",
|
||||
" '4908670e-0b9c-455b-96b8-e0f83bc59204',\n",
|
||||
" '7083ff98-d900-4435-a67c-d9690fc555ba',\n",
|
||||
" 'b910f9b1-2be0-4e0a-8b6c-93ba9b367df5',\n",
|
||||
" '7c40e950-4d2b-4293-9fb8-623a49e72607',\n",
|
||||
" '25a70e79-4905-42af-8b08-09f13bd48512',\n",
|
||||
" '695e2bcf-23d9-44d4-af26-a7b554c0c375']"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"store_books.add_texts(\n",
|
||||
"store.add_texts(\n",
|
||||
" [\n",
|
||||
" \"A timeless tale set in the Jazz Age, this novel delves into the lives of affluent socialites, their pursuits of wealth, love, and the elusive American Dream. Amidst extravagant parties and glittering opulence, the story unravels the complexities of desire, ambition, and the consequences of obsession.\",\n",
|
||||
" \"Set in a small Southern town during the 1930s, this novel explores themes of racial injustice, moral growth, and empathy through the eyes of a young girl. It follows her father, a principled lawyer, as he defends a black man accused of assaulting a white woman, confronting deep-seated prejudices and challenging societal norms along the way.\",\n",
|
||||
@@ -285,26 +202,63 @@
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying\n",
|
||||
"\n",
|
||||
"The database can be queried using a vector or a text prompt.\n",
|
||||
"If a text prompt is used, it's first converted into embedding and then queried.\n",
|
||||
"\n",
|
||||
"The `k` parameter specifies how many results to return from the query."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 29,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'title': '1984', 'author': 'George Orwell', 'year': 1949}, page_content='A chilling portrayal of a totalitarian regime, this dystopian novel offers a bleak vision of a future world dominated by surveillance, propaganda, and thought control. Through the eyes of a disillusioned protagonist, it explores the dangers of totalitarianism and the erosion of individual freedom in a society ruled by fear and oppression.'),\n",
|
||||
" Document(metadata={'title': 'The Road', 'author': 'Cormac McCarthy', 'year': 2006}, page_content='Set in a future world devastated by environmental collapse, this novel follows a group of survivors as they struggle to survive in a harsh, unforgiving landscape. Amidst scarcity and desperation, they must confront moral dilemmas and question the nature of humanity itself.'),\n",
|
||||
" Document(metadata={'title': 'Brave New World', 'author': 'Aldous Huxley', 'year': 1932}, page_content='In a society where emotion is suppressed and individuality is forbidden, one man dares to defy the oppressive regime. Through acts of rebellion and forbidden love, he discovers the power of human connection and the importance of free will.')]"
|
||||
"[Document(page_content='And my report is this: the State of the Union is strong—because you, the American people, are strong. \\n\\nWe are stronger today than we were a year ago. \\n\\nAnd we will be stronger a year from now than we are today. \\n\\nNow is our moment to meet and overcome the challenges of our time. \\n\\nAnd we will, as one people. \\n\\nOne America. \\n\\nThe United States of America. \\n\\nMay God bless you all. May God protect our troops.', metadata={'source': '../../how_to/state_of_the_union.txt'}),\n",
|
||||
" Document(page_content='And built the strongest, freest, and most prosperous nation the world has ever known. \\n\\nNow is the hour. \\n\\nOur moment of responsibility. \\n\\nOur test of resolve and conscience, of history itself. \\n\\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \\n\\nWell I know this nation. \\n\\nWe will meet the test. \\n\\nTo protect freedom and liberty, to expand fairness and opportunity. \\n\\nWe will save democracy. \\n\\nAs hard as these times have been, I am more optimistic about America today than I have been my whole life. \\n\\nBecause I see the future that is within our grasp. \\n\\nBecause I know there is simply nothing beyond our capacity. \\n\\nWe are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \\n\\nThe only nation that can be defined by a single word: possibilities. \\n\\nSo on this night, in our 245th year as a nation, I have come to report on the State of the Union.', metadata={'source': '../../how_to/state_of_the_union.txt'}),\n",
|
||||
" Document(page_content='Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \\n\\nIn this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. \\n\\nLet each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. \\n\\nPlease rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. \\n\\nThroughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos. \\n\\nThey keep moving. \\n\\nAnd the costs and the threats to America and the world keep rising. \\n\\nThat’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2. \\n\\nThe United States is a member along with 29 other nations. \\n\\nIt matters. American diplomacy matters. American resolve matters.', metadata={'source': '../../how_to/state_of_the_union.txt'}),\n",
|
||||
" Document(page_content='When we use taxpayer dollars to rebuild America – we are going to Buy American: buy American products to support American jobs. \\n\\nThe federal government spends about $600 Billion a year to keep the country safe and secure. \\n\\nThere’s been a law on the books for almost a century \\nto make sure taxpayers’ dollars support American jobs and businesses. \\n\\nEvery Administration says they’ll do it, but we are actually doing it. \\n\\nWe will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America. \\n\\nBut to compete for the best jobs of the future, we also need to level the playing field with China and other competitors. \\n\\nThat’s why it is so important to pass the Bipartisan Innovation Act sitting in Congress that will make record investments in emerging technologies and American manufacturing. \\n\\nLet me give you one example of why it’s so important to pass it.', metadata={'source': '../../how_to/state_of_the_union.txt'}),\n",
|
||||
" Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', metadata={'source': '../../how_to/state_of_the_union.txt'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"execution_count": 29,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = store_books.similarity_search(\"dystopia\", k=3)\n",
|
||||
"result = store.similarity_search(\"The United States of America\", k=5)\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A chilling portrayal of a totalitarian regime, this dystopian novel offers a bleak vision of a future world dominated by surveillance, propaganda, and thought control. Through the eyes of a disillusioned protagonist, it explores the dangers of totalitarianism and the erosion of individual freedom in a society ruled by fear and oppression.', metadata={'title': '1984', 'author': 'George Orwell', 'year': 1949}),\n",
|
||||
" Document(page_content='Narrated by a disillusioned teenager, this novel follows his journey of self-discovery and rebellion against the phoniness of the adult world. Through a series of encounters and reflections, it explores themes of alienation, identity, and the search for authenticity in a society marked by conformity and hypocrisy.', metadata={'title': 'The Catcher in the Rye', 'author': 'J.D. Salinger', 'year': 1951}),\n",
|
||||
" Document(page_content='Set in the English countryside during the early 19th century, this novel follows the lives of the Bennet sisters as they navigate the intricate social hierarchy of their time. Focusing on themes of marriage, class, and societal expectations, the story offers a witty and insightful commentary on the complexities of romantic relationships and the pursuit of happiness.', metadata={'title': 'Pride and Prejudice', 'author': 'Jane Austen', 'year': 1813})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 30,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = store.similarity_search(\"dystopia\", k=3, filter=\"year < 2000\")\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
@@ -312,63 +266,35 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Metadata Filtering\n",
|
||||
"## Querying with score\n",
|
||||
"\n",
|
||||
"Metadata can be used to filter the results of a query. You can refer to the [docs](https://upstash.com/docs/vector/features/filtering) to see more complex ways of filtering."
|
||||
"The score of the query can be included for every result. \n",
|
||||
"\n",
|
||||
"> The score returned in the query requests is a normalized value between 0 and 1, where 1 indicates the highest similarity and 0 the lowest regardless of the similarity function used. For more information look at the [docs](https://upstash.com/docs/vector/overall/features#vector-similarity-functions)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 31,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(metadata={'title': '1984', 'author': 'George Orwell', 'year': 1949}, page_content='A chilling portrayal of a totalitarian regime, this dystopian novel offers a bleak vision of a future world dominated by surveillance, propaganda, and thought control. Through the eyes of a disillusioned protagonist, it explores the dangers of totalitarianism and the erosion of individual freedom in a society ruled by fear and oppression.'),\n",
|
||||
" Document(metadata={'title': 'Brave New World', 'author': 'Aldous Huxley', 'year': 1932}, page_content='In a society where emotion is suppressed and individuality is forbidden, one man dares to defy the oppressive regime. Through acts of rebellion and forbidden love, he discovers the power of human connection and the importance of free will.'),\n",
|
||||
" Document(metadata={'title': 'The Catcher in the Rye', 'author': 'J.D. Salinger', 'year': 1951}, page_content='Narrated by a disillusioned teenager, this novel follows his journey of self-discovery and rebellion against the phoniness of the adult world. Through a series of encounters and reflections, it explores themes of alienation, identity, and the search for authenticity in a society marked by conformity and hypocrisy.')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.87391514\n",
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.8549463\n",
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.847913\n",
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.84328896\n",
|
||||
"{'source': '../../how_to/state_of_the_union.txt'} - 0.832347\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = store_books.similarity_search(\"dystopia\", k=3, filter=\"year < 2000\")\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Getting info about vector database\n",
|
||||
"result = store.similarity_search_with_score(\"The United States of America\", k=5)\n",
|
||||
"\n",
|
||||
"You can get information about your database like the distance metric dimension using the info function.\n",
|
||||
"\n",
|
||||
"> When an insert happens, the database an indexing takes place. While this is happening new vectors can not be queried. `pendingVectorCount` represents the number of vector that are currently being indexed. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"InfoResult(vector_count=49, pending_vector_count=0, index_size=2978163, dimension=1536, similarity_function='COSINE', namespaces={'': NamespaceInfo(vector_count=42, pending_vector_count=0), 'books': NamespaceInfo(vector_count=7, pending_vector_count=0)})"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"store.info()"
|
||||
"for doc, score in result:\n",
|
||||
" print(f\"{doc.metadata} - {score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -382,7 +308,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 32,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -400,12 +326,42 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 33,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"store.delete(delete_all=True)\n",
|
||||
"store_books.delete(delete_all=True)"
|
||||
"store.delete(delete_all=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Getting info about vector database\n",
|
||||
"\n",
|
||||
"You can get information about your database like the distance metric dimension using the info function.\n",
|
||||
"\n",
|
||||
"> When an insert happens, the database an indexing takes place. While this is happening new vectors can not be queried. `pendingVectorCount` represents the number of vector that are currently being indexed. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 36,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"InfoResult(vector_count=42, pending_vector_count=0, index_size=6470, dimension=384, similarity_function='COSINE')"
|
||||
]
|
||||
},
|
||||
"execution_count": 36,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"store.info()"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -425,7 +381,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.4"
|
||||
"version": "3.9.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -71,11 +71,11 @@
|
||||
"from langchain_anthropic import ChatAnthropic\n",
|
||||
"from langchain_community.tools.tavily_search import TavilySearchResults\n",
|
||||
"from langchain_core.messages import HumanMessage\n",
|
||||
"from langgraph.checkpoint.memory import MemorySaver\n",
|
||||
"from langgraph.checkpoint.sqlite import SqliteSaver\n",
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"# Create the agent\n",
|
||||
"memory = MemorySaver()\n",
|
||||
"memory = SqliteSaver.from_conn_string(\":memory:\")\n",
|
||||
"model = ChatAnthropic(model_name=\"claude-3-sonnet-20240229\")\n",
|
||||
"search = TavilySearchResults(max_results=2)\n",
|
||||
"tools = [search]\n",
|
||||
@@ -121,7 +121,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -U langchain-community langgraph langchain-anthropic tavily-python langgraph-checkpoint-sqlite"
|
||||
"%pip install -U langchain-community langgraph langchain-anthropic tavily-python"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -606,9 +606,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.checkpoint.memory import MemorySaver\n",
|
||||
"from langgraph.checkpoint.sqlite import SqliteSaver\n",
|
||||
"\n",
|
||||
"memory = MemorySaver()"
|
||||
"memory = SqliteSaver.from_conn_string(\":memory:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -857,9 +857,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.checkpoint.memory import MemorySaver\n",
|
||||
"from langgraph.checkpoint.sqlite import SqliteSaver\n",
|
||||
"\n",
|
||||
"memory = MemorySaver()\n",
|
||||
"memory = SqliteSaver.from_conn_string(\":memory:\")\n",
|
||||
"\n",
|
||||
"agent_executor = create_react_agent(llm, tools, checkpointer=memory)"
|
||||
]
|
||||
@@ -1012,15 +1012,20 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import bs4\n",
|
||||
"from langchain.agents import AgentExecutor, create_tool_calling_agent\n",
|
||||
"from langchain.tools.retriever import create_retriever_tool\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.chat_message_histories import ChatMessageHistory\n",
|
||||
"from langchain_community.document_loaders import WebBaseLoader\n",
|
||||
"from langchain_core.chat_history import BaseChatMessageHistory\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
|
||||
"from langchain_core.runnables.history import RunnableWithMessageHistory\n",
|
||||
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
|
||||
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
|
||||
"from langgraph.checkpoint.memory import MemorySaver\n",
|
||||
"from langgraph.checkpoint.sqlite import SqliteSaver\n",
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"memory = MemorySaver()\n",
|
||||
"memory = SqliteSaver.from_conn_string(\":memory:\")\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
|
||||
@@ -601,7 +601,7 @@
|
||||
"relevant documents, constructs a prompt, passes that to a model, and\n",
|
||||
"parses the output.\n",
|
||||
"\n",
|
||||
"We’ll use the gpt-4o-mini OpenAI chat model, but any LangChain `LLM`\n",
|
||||
"We’ll use the gpt-3.5-turbo OpenAI chat model, but any LangChain `LLM`\n",
|
||||
"or `ChatModel` could be substituted in.\n",
|
||||
"\n",
|
||||
"```{=mdx}\n",
|
||||
|
||||
@@ -54,9 +54,12 @@
|
||||
"id": "00df631d-5121-4918-94aa-b88acce9b769",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Legacy\n",
|
||||
"import { ColumnContainer, Column } from \"@theme/Columns\";\n",
|
||||
"\n",
|
||||
"<details open>"
|
||||
"<ColumnContainer>\n",
|
||||
"<Column>\n",
|
||||
"\n",
|
||||
"#### Legacy\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -108,11 +111,12 @@
|
||||
"id": "f8e36b0e-c7dc-4130-a51b-189d4b756c7f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"</Column>\n",
|
||||
"\n",
|
||||
"## LCEL\n",
|
||||
"<Column>\n",
|
||||
"\n",
|
||||
"<details open>"
|
||||
"#### LCEL\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -170,6 +174,10 @@
|
||||
"id": "6b386ce6-895e-442c-88f3-7bec0ab9f401",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"</Column>\n",
|
||||
"</ColumnContainer>\n",
|
||||
"\n",
|
||||
"The above example uses the same `history` for all sessions. The example below shows how to use a different chat history for each session."
|
||||
]
|
||||
},
|
||||
@@ -222,8 +230,6 @@
|
||||
"id": "b2717810",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"\n",
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"See [this tutorial](/docs/tutorials/chatbot) for a more end-to-end guide on building with [`RunnableWithMessageHistory`](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html).\n",
|
||||
|
||||
@@ -83,9 +83,13 @@
|
||||
"id": "8bc06416",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Legacy\n",
|
||||
"import { ColumnContainer, Column } from \"@theme/Columns\";\n",
|
||||
"\n",
|
||||
"<details open>"
|
||||
"<ColumnContainer>\n",
|
||||
"\n",
|
||||
"<Column>\n",
|
||||
"\n",
|
||||
"#### Legacy"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -161,11 +165,12 @@
|
||||
"id": "43a8a23c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"</Column>\n",
|
||||
"\n",
|
||||
"## LCEL\n",
|
||||
"<Column>\n",
|
||||
"\n",
|
||||
"<details open>"
|
||||
"#### LCEL\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -248,7 +253,9 @@
|
||||
"id": "b2717810",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"</Column>\n",
|
||||
"\n",
|
||||
"</ColumnContainer>\n",
|
||||
"\n",
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
@@ -256,14 +263,6 @@
|
||||
"\n",
|
||||
"Next, check out the [LCEL conceptual docs](/docs/concepts/#langchain-expression-language-lcel) for more background information."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7bfc38bd-0ff8-40ee-83a3-9d7553364fd7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -2,48 +2,33 @@
|
||||
sidebar_position: 1
|
||||
---
|
||||
|
||||
# How to migrate from v0.0 chains
|
||||
# How to migrate chains to LCEL
|
||||
|
||||
:::info Prerequisites
|
||||
|
||||
This guide assumes familiarity with the following concepts:
|
||||
- [LangChain Expression Language](/docs/concepts#langchain-expression-language-lcel)
|
||||
- [LangGraph](https://langchain-ai.github.io/langgraph/)
|
||||
|
||||
:::
|
||||
|
||||
LangChain maintains a number of legacy abstractions. Many of these can be reimplemented via short combinations of LCEL and LangGraph primitives.
|
||||
|
||||
### LCEL
|
||||
[LCEL](/docs/concepts/#langchain-expression-language-lcel) is designed to streamline the process of building useful apps with LLMs and combining related components. It does this by providing:
|
||||
LCEL is designed to streamline the process of building useful apps with LLMs and combining related components. It does this by providing:
|
||||
|
||||
1. **A unified interface**: Every LCEL object implements the `Runnable` interface, which defines a common set of invocation methods (`invoke`, `batch`, `stream`, `ainvoke`, ...). This makes it possible to also automatically and consistently support useful operations like streaming of intermediate steps and batching, since every chain composed of LCEL objects is itself an LCEL object.
|
||||
2. **Composition primitives**: LCEL provides a number of primitives that make it easy to compose chains, parallelize components, add fallbacks, dynamically configure chain internals, and more.
|
||||
|
||||
### LangGraph
|
||||
[LangGraph](https://langchain-ai.github.io/langgraph/), built on top of LCEL, allows for performant orchestrations of application components while maintaining concise and readable code. It includes built-in persistence, support for cycles, and prioritizes controllability.
|
||||
If LCEL grows unwieldy for larger or more complex chains, they may benefit from a LangGraph implementation.
|
||||
|
||||
### Advantages
|
||||
Using these frameworks for existing v0.0 chains confers some advantages:
|
||||
LangChain maintains a number of legacy abstractions. Many of these can be reimplemented via short combinations of LCEL primitives. Doing so confers some general advantages:
|
||||
|
||||
- The resulting chains typically implement the full `Runnable` interface, including streaming and asynchronous support where appropriate;
|
||||
- The chains may be more easily extended or modified;
|
||||
- The parameters of the chain are typically surfaced for easier customization (e.g., prompts) over previous versions, which tended to be subclasses and had opaque parameters and internals.
|
||||
- If using LangGraph, the chain supports built-in persistence, allowing for conversational experiences via a "memory" of the chat history.
|
||||
- If using LangGraph, the steps of the chain can be streamed, allowing for greater control and customizability.
|
||||
|
||||
The LCEL implementations can be slightly more verbose, but there are significant benefits in transparency and customizability.
|
||||
|
||||
The below pages assist with migration from various specific chains to LCEL and LangGraph:
|
||||
The below pages assist with migration from various specific chains to LCEL:
|
||||
|
||||
- [LLMChain](/docs/versions/migrating_chains/llm_chain)
|
||||
- [ConversationChain](/docs/versions/migrating_chains/conversation_chain)
|
||||
- [RetrievalQA](/docs/versions/migrating_chains/retrieval_qa)
|
||||
- [ConversationalRetrievalChain](/docs/versions/migrating_chains/conversation_retrieval_chain)
|
||||
- [StuffDocumentsChain](/docs/versions/migrating_chains/stuff_docs_chain)
|
||||
- [MapReduceDocumentsChain](/docs/versions/migrating_chains/map_reduce_chain)
|
||||
- [MapRerankDocumentsChain](/docs/versions/migrating_chains/map_rerank_docs_chain)
|
||||
- [RefineDocumentsChain](/docs/versions/migrating_chains/refine_docs_chain)
|
||||
- [LLMRouterChain](/docs/versions/migrating_chains/llm_router_chain)
|
||||
- [MultiPromptChain](/docs/versions/migrating_chains/multi_prompt_chain)
|
||||
|
||||
Check out the [LCEL conceptual docs](/docs/concepts/#langchain-expression-language-lcel) and [LangGraph docs](https://langchain-ai.github.io/langgraph/) for more background information.
|
||||
Check out the [LCEL conceptual docs](/docs/concepts/#langchain-expression-language-lcel) for more background information.
|
||||
@@ -52,9 +52,13 @@
|
||||
"id": "e3621b62-a037-42b8-8faa-59575608bb8b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Legacy\n",
|
||||
"import { ColumnContainer, Column } from \"@theme/Columns\";\n",
|
||||
"\n",
|
||||
"<details open>"
|
||||
"<ColumnContainer>\n",
|
||||
"\n",
|
||||
"<Column>\n",
|
||||
"\n",
|
||||
"#### Legacy\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -94,11 +98,13 @@
|
||||
"id": "cdc3b527-c09e-4c77-9711-c3cc4506cd95",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"\n",
|
||||
"## LCEL\n",
|
||||
"</Column>\n",
|
||||
"\n",
|
||||
"<details open>"
|
||||
"<Column>\n",
|
||||
"\n",
|
||||
"#### LCEL\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -137,6 +143,10 @@
|
||||
"id": "3c0b0513-77b8-4371-a20e-3e487cec7e7f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"</Column>\n",
|
||||
"</ColumnContainer>\n",
|
||||
"\n",
|
||||
"Note that `LLMChain` by default returns a `dict` containing both the input and the output. If this behavior is desired, we can replicate it using another LCEL primitive, [`RunnablePassthrough`](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.passthrough.RunnablePassthrough.html):"
|
||||
]
|
||||
},
|
||||
@@ -171,8 +181,6 @@
|
||||
"id": "b2717810",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"\n",
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"See [this tutorial](/docs/tutorials/llm_chain) for more detail on building with prompt templates, LLMs, and output parsers.\n",
|
||||
|
||||
@@ -1,283 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "575befea-4d98-4941-8e55-1581b169a674",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"title: Migrating from LLMRouterChain\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "14625d35-efca-41cf-b203-be9f4c375700",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The [`LLMRouterChain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.router.llm_router.LLMRouterChain.html) routed an input query to one of multiple destinations-- that is, given an input query, it used a LLM to select from a list of destination chains, and passed its inputs to the selected chain.\n",
|
||||
"\n",
|
||||
"`LLMRouterChain` does not support common [chat model](/docs/concepts/#chat-models) features, such as message roles and [tool calling](/docs/concepts/#functiontool-calling). Under the hood, `LLMRouterChain` routes a query by instructing the LLM to generate JSON-formatted text, and parsing out the intended destination.\n",
|
||||
"\n",
|
||||
"Consider an example from a [MultiPromptChain](/docs/versions/migrating_chains/multi_prompt_chain), which uses `LLMRouterChain`. Below is an (example) default prompt:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "364814a5-d15c-41bb-bf3f-581df51a4721",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Given a raw text input to a language model select the model prompt best suited for the input. You will be given the names of the available prompts and a description of what the prompt is best suited for. You may also revise the original input if you think that revising it will ultimately lead to a better response from the language model.\n",
|
||||
"\n",
|
||||
"<< FORMATTING >>\n",
|
||||
"Return a markdown code snippet with a JSON object formatted to look like:\n",
|
||||
"'''json\n",
|
||||
"{{\n",
|
||||
" \"destination\": string \\ name of the prompt to use or \"DEFAULT\"\n",
|
||||
" \"next_inputs\": string \\ a potentially modified version of the original input\n",
|
||||
"}}\n",
|
||||
"'''\n",
|
||||
"\n",
|
||||
"REMEMBER: \"destination\" MUST be one of the candidate prompt names specified below OR it can be \"DEFAULT\" if the input is not well suited for any of the candidate prompts.\n",
|
||||
"REMEMBER: \"next_inputs\" can just be the original input if you don't think any modifications are needed.\n",
|
||||
"\n",
|
||||
"<< CANDIDATE PROMPTS >>\n",
|
||||
"\n",
|
||||
"animals: prompt for animal expert\n",
|
||||
"vegetables: prompt for a vegetable expert\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"<< INPUT >>\n",
|
||||
"{input}\n",
|
||||
"\n",
|
||||
"<< OUTPUT (must include '''json at the start of the response) >>\n",
|
||||
"<< OUTPUT (must end with ''') >>\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains.router.multi_prompt import MULTI_PROMPT_ROUTER_TEMPLATE\n",
|
||||
"\n",
|
||||
"destinations = \"\"\"\n",
|
||||
"animals: prompt for animal expert\n",
|
||||
"vegetables: prompt for a vegetable expert\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(destinations=destinations)\n",
|
||||
"\n",
|
||||
"print(router_template.replace(\"`\", \"'\")) # for rendering purposes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "934937d1-fc0a-4d3f-b297-29f96e6a8f5e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Most of the behavior is determined via a single natural language prompt. Chat models that support [tool calling](/docs/how_to/tool_calling/) features confer a number of advantages for this task:\n",
|
||||
"\n",
|
||||
"- Supports chat prompt templates, including messages with `system` and other roles;\n",
|
||||
"- Tool-calling models are fine-tuned to generate structured output;\n",
|
||||
"- Support for runnable methods like streaming and async operations.\n",
|
||||
"\n",
|
||||
"Now let's look at `LLMRouterChain` side-by-side with an LCEL implementation that uses tool-calling. Note that for this guide we will `langchain-openai >= 0.1.20`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ed12b22b-5452-4776-aee3-b67d9f965082",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-core langchain-openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b0edbba1-a497-49ef-ade7-4fe7967360eb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5d4dc41c-3fdc-4093-ba5e-31a9ebb54e13",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Legacy\n",
|
||||
"\n",
|
||||
"<details open>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "c58c9269-5a1d-4234-88b5-7168944618bf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.router.llm_router import LLMRouterChain, RouterOutputParser\n",
|
||||
"from langchain_core.prompts import PromptTemplate\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
|
||||
"\n",
|
||||
"router_prompt = PromptTemplate(\n",
|
||||
" # Note: here we use the prompt template from above. Generally this would need\n",
|
||||
" # to be customized.\n",
|
||||
" template=router_template,\n",
|
||||
" input_variables=[\"input\"],\n",
|
||||
" output_parser=RouterOutputParser(),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"chain = LLMRouterChain.from_llm(llm, router_prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a22ebdca-5f53-459e-9cff-a97b2354ffe0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"vegetables\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = chain.invoke({\"input\": \"What color are carrots?\"})\n",
|
||||
"\n",
|
||||
"print(result[\"destination\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6fd48120-056f-4c58-a04f-da5198c23068",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"\n",
|
||||
"## LCEL\n",
|
||||
"\n",
|
||||
"<details open>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "5bbebac2-df19-4f59-8a69-f61cd7286e59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from operator import itemgetter\n",
|
||||
"from typing import Literal\n",
|
||||
"\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"from typing_extensions import TypedDict\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
|
||||
"\n",
|
||||
"route_system = \"Route the user's query to either the animal or vegetable expert.\"\n",
|
||||
"route_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", route_system),\n",
|
||||
" (\"human\", \"{input}\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Define schema for output:\n",
|
||||
"class RouteQuery(TypedDict):\n",
|
||||
" \"\"\"Route query to destination expert.\"\"\"\n",
|
||||
"\n",
|
||||
" destination: Literal[\"animal\", \"vegetable\"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Instead of writing formatting instructions into the prompt, we\n",
|
||||
"# leverage .with_structured_output to coerce the output into a simple\n",
|
||||
"# schema.\n",
|
||||
"chain = route_prompt | llm.with_structured_output(RouteQuery)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "88012e10-8def-44fa-833f-989935824182",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"vegetable\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = chain.invoke({\"input\": \"What color are carrots?\"})\n",
|
||||
"\n",
|
||||
"print(result[\"destination\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "baf7ba9e-65b4-48af-8a39-453c01a7b7cb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"\n",
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"See [this tutorial](/docs/tutorials/llm_chain) for more detail on building with prompt templates, LLMs, and output parsers.\n",
|
||||
"\n",
|
||||
"Check out the [LCEL conceptual docs](/docs/concepts/#langchain-expression-language-lcel) for more background information."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "353e4bab-3b8a-4e89-89e2-200a8d8eb8dd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user