Files
langchain/docs/docs/how_to/qa_per_user.ipynb
Erick Friis 21d14549a9 docs: v0.2 docs in master (#21438)
current python.langchain.com is building from branch `v0.1`. Iterate on
v0.2 docs here.

---------

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Averi Kitsch <akitsch@google.com>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Martín Gotelli Ferenaz <martingotelliferenaz@gmail.com>
Co-authored-by: Fayfox <admin@fayfox.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Dawson Bauer <105886620+djbauer2@users.noreply.github.com>
Co-authored-by: Ravindu Somawansa <ravindu.somawansa@gmail.com>
Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: WeichenXu <weichen.xu@databricks.com>
Co-authored-by: Benito Geordie <89472452+benitoThree@users.noreply.github.com>
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
Co-authored-by: Sevin F. Varoglu <sfvaroglu@octoml.ai>
Co-authored-by: MacanPN <martin.triska@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
Co-authored-by: Hyeongchan Kim <kozistr@gmail.com>
Co-authored-by: sdan <git@sdan.io>
Co-authored-by: Guangdong Liu <liugddx@gmail.com>
Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: pjb157 <84070455+pjb157@users.noreply.github.com>
Co-authored-by: Eun Hye Kim <ehkim1440@gmail.com>
Co-authored-by: kaijietti <43436010+kaijietti@users.noreply.github.com>
Co-authored-by: Pengcheng Liu <pcliu.fd@gmail.com>
Co-authored-by: Tomer Cagan <tomer@tomercagan.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
2024-05-08 12:29:59 -07:00

320 lines
8.7 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "14d3fd06",
"metadata": {},
"source": [
"# How to do per-user retrieval\n",
"\n",
"This guide demonstrates how to configure runtime properties of a retrieval chain. An example application is to limit the documents available to a retriever based on the user.\n",
"\n",
"When building a retrieval app, you often have to build it with multiple users in mind. This means that you may be storing data not just for one user, but for many different users, and they should not be able to see eachother's data. This means that you need to be able to configure your retrieval chain to only retrieve certain information. This generally involves two steps.\n",
"\n",
"**Step 1: Make sure the retriever you are using supports multiple users**\n",
"\n",
"At the moment, there is no unified flag or filter for this in LangChain. Rather, each vectorstore and retriever may have their own, and may be called different things (namespaces, multi-tenancy, etc). For vectorstores, this is generally exposed as a keyword argument that is passed in during `similarity_search`. By reading the documentation or source code, figure out whether the retriever you are using supports multiple users, and, if so, how to use it.\n",
"\n",
"Note: adding documentation and/or support for multiple users for retrievers that do not support it (or document it) is a GREAT way to contribute to LangChain\n",
"\n",
"**Step 2: Add that parameter as a configurable field for the chain**\n",
"\n",
"This will let you easily call the chain and configure any relevant flags at runtime. See [this documentation](/docs/how_to/configure) for more information on configuration.\n",
"\n",
"Now, at runtime you can call this chain with configurable field.\n",
"\n",
"## Code Example\n",
"\n",
"Let's see a concrete example of what this looks like in code. We will use Pinecone for this example.\n",
"\n",
"To configure Pinecone, set the following environment variable:\n",
"\n",
"- `PINECONE_API_KEY`: Your Pinecone API key"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7345de3c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['ce15571e-4e2f-44c9-98df-7e83f6f63095']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_openai import OpenAIEmbeddings\n",
"from langchain_pinecone import PineconeVectorStore\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"vectorstore = PineconeVectorStore(index_name=\"test-example\", embedding=embeddings)\n",
"\n",
"vectorstore.add_texts([\"i worked at kensho\"], namespace=\"harrison\")\n",
"vectorstore.add_texts([\"i worked at facebook\"], namespace=\"ankush\")"
]
},
{
"cell_type": "markdown",
"id": "39c11920",
"metadata": {},
"source": [
"The pinecone kwarg for `namespace` can be used to separate documents"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3c2a39fa",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='i worked at facebook')]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This will only get documents for Ankush\n",
"vectorstore.as_retriever(search_kwargs={\"namespace\": \"ankush\"}).get_relevant_documents(\n",
" \"where did i work?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "56393baa",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='i worked at kensho')]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This will only get documents for Harrison\n",
"vectorstore.as_retriever(\n",
" search_kwargs={\"namespace\": \"harrison\"}\n",
").get_relevant_documents(\"where did i work?\")"
]
},
{
"cell_type": "markdown",
"id": "88ae97ed",
"metadata": {},
"source": [
"We can now create the chain that we will use to do question-answering over.\n",
"\n",
"Let's first select a LLM.\n",
"```{=mdx}\n",
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
"\n",
"<ChatModelTabs customVarName=\"llm\" />\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "68162d05",
"metadata": {},
"outputs": [],
"source": [
"# | output: false\n",
"# | echo: false\n",
"\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI()"
]
},
{
"cell_type": "markdown",
"id": "b6778ffa",
"metadata": {},
"source": [
"This is basic question-answering chain set up."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "44a865f6",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import (\n",
" ConfigurableField,\n",
" RunnablePassthrough,\n",
")\n",
"\n",
"template = \"\"\"Answer the question based only on the following context:\n",
"{context}\n",
"Question: {question}\n",
"\"\"\"\n",
"prompt = ChatPromptTemplate.from_template(template)\n",
"\n",
"retriever = vectorstore.as_retriever()"
]
},
{
"cell_type": "markdown",
"id": "72125166",
"metadata": {},
"source": [
"Here we mark the retriever as having a configurable field. All vectorstore retrievers have `search_kwargs` as a field. This is just a dictionary, with vectorstore specific fields.\n",
"\n",
"This will let us pass in a value for `search_kwargs` when invoking the chain."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "babbadff",
"metadata": {},
"outputs": [],
"source": [
"configurable_retriever = retriever.configurable_fields(\n",
" search_kwargs=ConfigurableField(\n",
" id=\"search_kwargs\",\n",
" name=\"Search Kwargs\",\n",
" description=\"The search kwargs to use\",\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"id": "2d481b70",
"metadata": {},
"source": [
"We can now create the chain using our configurable retriever"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "210b0446",
"metadata": {},
"outputs": [],
"source": [
"chain = (\n",
" {\"context\": configurable_retriever, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7f6458c3",
"metadata": {},
"source": [
"We can now invoke the chain with configurable options. `search_kwargs` is the id of the configurable field. The value is the search kwargs to use for Pinecone"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a38037b2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The user worked at Kensho.'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\n",
" \"where did the user work?\",\n",
" config={\"configurable\": {\"search_kwargs\": {\"namespace\": \"harrison\"}}},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "0ff4f5f2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The user worked at Facebook.'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\n",
" \"where did the user work?\",\n",
" config={\"configurable\": {\"search_kwargs\": {\"namespace\": \"ankush\"}}},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7fb27b941602401d91542211134fc71a",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"For more vectorstore implementations for multi-user, please refer to specific pages, such as [Milvus](/docs/integrations/vectorstores/milvus)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}