mirror of
https://github.com/hwchase17/langchain.git
synced 2025-10-02 10:51:33 +00:00
Thank you for contributing to LangChain! - [ ] **PR title**: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] **PR message**: ***Delete this entire checklist*** and replace with - **Description:** a description of the change - **Issue:** the issue # it fixes, if applicable - **Dependencies:** any dependencies required for this change - **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
503 lines
21 KiB
Plaintext
503 lines
21 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bf37a837-7a6a-447b-8779-38f26c585887",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Vector stores and retrievers\n",
|
|
"\n",
|
|
"This tutorial will familiarize you with LangChain's vector store and retriever abstractions. These abstractions are designed to support retrieval of data-- from (vector) databases and other sources-- for integration with LLM workflows. They are important for applications that fetch data to be reasoned over as part of model inference, as in the case of retrieval-augmented generation, or RAG (see our RAG tutorial [here](/docs/tutorials/rag)).\n",
|
|
"\n",
|
|
"## Concepts\n",
|
|
"\n",
|
|
"This guide focuses on retrieval of text data. We will cover the following concepts:\n",
|
|
"\n",
|
|
"- Documents;\n",
|
|
"- Vector stores;\n",
|
|
"- Retrievers.\n",
|
|
"\n",
|
|
"## Setup\n",
|
|
"\n",
|
|
"### Jupyter Notebook\n",
|
|
"\n",
|
|
"This and other tutorials are perhaps most conveniently run in a Jupyter notebook. See [here](https://jupyter.org/install) for instructions on how to install.\n",
|
|
"\n",
|
|
"### Installation\n",
|
|
"\n",
|
|
"This tutorial requires the `langchain`, `langchain-chroma`, and `langchain-openai` packages:\n",
|
|
"\n",
|
|
"```{=mdx}\n",
|
|
"import Tabs from '@theme/Tabs';\n",
|
|
"import TabItem from '@theme/TabItem';\n",
|
|
"import CodeBlock from \"@theme/CodeBlock\";\n",
|
|
"\n",
|
|
"<Tabs>\n",
|
|
" <TabItem value=\"pip\" label=\"Pip\" default>\n",
|
|
" <CodeBlock language=\"bash\">pip install langchain langchain-chroma langchain-openai</CodeBlock>\n",
|
|
" </TabItem>\n",
|
|
" <TabItem value=\"conda\" label=\"Conda\">\n",
|
|
" <CodeBlock language=\"bash\">conda install langchain langchain-chroma langchain-openai -c conda-forge</CodeBlock>\n",
|
|
" </TabItem>\n",
|
|
"</Tabs>\n",
|
|
"\n",
|
|
"```\n",
|
|
"\n",
|
|
"For more details, see our [Installation guide](/docs/how_to/installation).\n",
|
|
"\n",
|
|
"### LangSmith\n",
|
|
"\n",
|
|
"Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls.\n",
|
|
"As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent.\n",
|
|
"The best way to do this is with [LangSmith](https://smith.langchain.com).\n",
|
|
"\n",
|
|
"After you sign up at the link above, make sure to set your environment variables to start logging traces:\n",
|
|
"\n",
|
|
"```shell\n",
|
|
"export LANGCHAIN_TRACING_V2=\"true\"\n",
|
|
"export LANGCHAIN_API_KEY=\"...\"\n",
|
|
"```\n",
|
|
"\n",
|
|
"Or, if in a notebook, you can set them with:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"import getpass\n",
|
|
"import os\n",
|
|
"\n",
|
|
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
|
"os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
|
|
"```\n",
|
|
"\n",
|
|
"\n",
|
|
"## Documents\n",
|
|
"\n",
|
|
"LangChain implements a [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) abstraction, which is intended to represent a unit of text and associated metadata. It has two attributes:\n",
|
|
"\n",
|
|
"- `page_content`: a string representing the content;\n",
|
|
"- `metadata`: a dict containing arbitrary metadata.\n",
|
|
"\n",
|
|
"The `metadata` attribute can capture information about the source of the document, its relationship to other documents, and other information. Note that an individual `Document` object often represents a chunk of a larger document.\n",
|
|
"\n",
|
|
"Let's generate some sample documents:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "9f3dc151-7b2f-4d94-9558-7a84f7eab100",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.documents import Document\n",
|
|
"\n",
|
|
"documents = [\n",
|
|
" Document(\n",
|
|
" page_content=\"Dogs are great companions, known for their loyalty and friendliness.\",\n",
|
|
" metadata={\"source\": \"mammal-pets-doc\"},\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" page_content=\"Cats are independent pets that often enjoy their own space.\",\n",
|
|
" metadata={\"source\": \"mammal-pets-doc\"},\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" page_content=\"Goldfish are popular pets for beginners, requiring relatively simple care.\",\n",
|
|
" metadata={\"source\": \"fish-pets-doc\"},\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" page_content=\"Parrots are intelligent birds capable of mimicking human speech.\",\n",
|
|
" metadata={\"source\": \"bird-pets-doc\"},\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" page_content=\"Rabbits are social animals that need plenty of space to hop around.\",\n",
|
|
" metadata={\"source\": \"mammal-pets-doc\"},\n",
|
|
" ),\n",
|
|
"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1cac19bd-27d1-40f1-9c27-7a586b685b4e",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here we've generated five documents, containing metadata indicating three distinct \"sources\".\n",
|
|
"\n",
|
|
"## Vector stores\n",
|
|
"\n",
|
|
"Vector search is a common way to store and search over unstructured data (such as unstructured text). The idea is to store numeric vectors that are associated with the text. Given a query, we can [embed](/docs/concepts#embedding-models) it as a vector of the same dimension and use vector similarity metrics to identify related data in the store.\n",
|
|
"\n",
|
|
"LangChain [VectorStore](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStore.html) objects contain methods for adding text and `Document` objects to the store, and querying them using various similarity metrics. They are often initialized with [embedding](/docs/how_to/embed_text) models, which determine how text data is translated to numeric vectors.\n",
|
|
"\n",
|
|
"LangChain includes a suite of [integrations](/docs/integrations/vectorstores) with different vector store technologies. Some vector stores are hosted by a provider (e.g., various cloud providers) and require specific credentials to use; some (such as [Postgres](/docs/integrations/vectorstores/pgvector)) run in separate infrastructure that can be run locally or via a third-party; others can run in-memory for lightweight workloads. Here we will demonstrate usage of LangChain VectorStores using [Chroma](/docs/integrations/vectorstores/chroma), which includes an in-memory implementation.\n",
|
|
"\n",
|
|
"To instantiate a vector store, we often need to provide an [embedding](/docs/how_to/embed_text) model to specify how text should be converted into a numeric vector. Here we will use [OpenAI embeddings](/docs/integrations/text_embedding/openai/)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "d48acc28-1a34-414b-8e08-fbdef3a2a60b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_chroma import Chroma\n",
|
|
"from langchain_openai import OpenAIEmbeddings\n",
|
|
"\n",
|
|
"vectorstore = Chroma.from_documents(\n",
|
|
" documents,\n",
|
|
" embedding=OpenAIEmbeddings(),\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ff0f0b43-e5b8-4c79-b782-a02f17345487",
|
|
"metadata": {},
|
|
"source": [
|
|
"Calling `.from_documents` here will add the documents to the vector store. [VectorStore](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStore.html) implements methods for adding documents that can also be called after the object is instantiated. Most implementations will allow you to connect to an existing vector store-- e.g., by providing a client, index name, or other information. See the documentation for a specific [integration](/docs/integrations/vectorstores) for more detail.\n",
|
|
"\n",
|
|
"Once we've instantiated a `VectorStore` that contains documents, we can query it. [VectorStore](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStore.html) includes methods for querying:\n",
|
|
"- Synchronously and asynchronously;\n",
|
|
"- By string query and by vector;\n",
|
|
"- With and without returning similarity scores;\n",
|
|
"- By similarity and [maximum marginal relevance](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStore.html#langchain_core.vectorstores.VectorStore.max_marginal_relevance_search) (to balance similarity with query to diversity in retrieved results).\n",
|
|
"\n",
|
|
"The methods will generally include a list of [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html#langchain_core.documents.base.Document) objects in their outputs.\n",
|
|
"\n",
|
|
"### Examples\n",
|
|
"\n",
|
|
"Return documents based on similarity to a string query:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "7e01ed91-1a98-4221-960a-bd7a2541a548",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" Document(page_content='Dogs are great companions, known for their loyalty and friendliness.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" Document(page_content='Rabbits are social animals that need plenty of space to hop around.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" Document(page_content='Parrots are intelligent birds capable of mimicking human speech.', metadata={'source': 'bird-pets-doc'})]"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"vectorstore.similarity_search(\"cat\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4d4f9857-5a7d-4b5f-82b8-ff76539143c2",
|
|
"metadata": {},
|
|
"source": [
|
|
"Async query:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "618af196-6182-4a7d-8b09-07493fcdc868",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" Document(page_content='Dogs are great companions, known for their loyalty and friendliness.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" Document(page_content='Rabbits are social animals that need plenty of space to hop around.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" Document(page_content='Parrots are intelligent birds capable of mimicking human speech.', metadata={'source': 'bird-pets-doc'})]"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"await vectorstore.asimilarity_search(\"cat\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d4172698-9ad7-4422-99b2-bdc268e99c75",
|
|
"metadata": {},
|
|
"source": [
|
|
"Return scores:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "4ed24af2-0d82-478c-949b-b389348d4e9f",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[(Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" 0.3751849830150604),\n",
|
|
" (Document(page_content='Dogs are great companions, known for their loyalty and friendliness.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" 0.48316916823387146),\n",
|
|
" (Document(page_content='Rabbits are social animals that need plenty of space to hop around.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" 0.49601367115974426),\n",
|
|
" (Document(page_content='Parrots are intelligent birds capable of mimicking human speech.', metadata={'source': 'bird-pets-doc'}),\n",
|
|
" 0.4972994923591614)]"
|
|
]
|
|
},
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Note that providers implement different scores; Chroma here\n",
|
|
"# returns a distance metric that should vary inversely with\n",
|
|
"# similarity.\n",
|
|
"\n",
|
|
"vectorstore.similarity_search_with_score(\"cat\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b4991642-7275-40a9-b11a-e3beccbf2614",
|
|
"metadata": {},
|
|
"source": [
|
|
"Return documents based on similarity to an embedded query:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "b1a5eabb-a821-48cc-917e-cc27f03e4bcc",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" Document(page_content='Dogs are great companions, known for their loyalty and friendliness.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" Document(page_content='Rabbits are social animals that need plenty of space to hop around.', metadata={'source': 'mammal-pets-doc'}),\n",
|
|
" Document(page_content='Parrots are intelligent birds capable of mimicking human speech.', metadata={'source': 'bird-pets-doc'})]"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"embedding = OpenAIEmbeddings().embed_query(\"cat\")\n",
|
|
"\n",
|
|
"vectorstore.similarity_search_by_vector(embedding)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "168dbbec-ea97-4cc9-bb1a-75519c2d08af",
|
|
"metadata": {},
|
|
"source": [
|
|
"Learn more:\n",
|
|
"\n",
|
|
"- [API reference](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStore.html)\n",
|
|
"- [How-to guide](/docs/how_to/vectorstores)\n",
|
|
"- [Integration-specific docs](/docs/integrations/vectorstores)\n",
|
|
"\n",
|
|
"## Retrievers\n",
|
|
"\n",
|
|
"LangChain `VectorStore` objects do not subclass [Runnable](https://api.python.langchain.com/en/latest/core_api_reference.html#module-langchain_core.runnables), and so cannot immediately be integrated into LangChain Expression Language [chains](/docs/concepts/#langchain-expression-language-lcel).\n",
|
|
"\n",
|
|
"LangChain [Retrievers](https://api.python.langchain.com/en/latest/core_api_reference.html#module-langchain_core.retrievers) are Runnables, so they implement a standard set of methods (e.g., synchronous and asynchronous `invoke` and `batch` operations) and are designed to be incorporated in LCEL chains.\n",
|
|
"\n",
|
|
"We can create a simple version of this ourselves, without subclassing `Retriever`. If we choose what method we wish to use to retrieve documents, we can create a runnable easily. Below we will build one around the `similarity_search` method:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "f1461582-e569-4326-bd95-510f72edf019",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[[Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'source': 'mammal-pets-doc'})],\n",
|
|
" [Document(page_content='Goldfish are popular pets for beginners, requiring relatively simple care.', metadata={'source': 'fish-pets-doc'})]]"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from typing import List\n",
|
|
"\n",
|
|
"from langchain_core.documents import Document\n",
|
|
"from langchain_core.runnables import RunnableLambda\n",
|
|
"\n",
|
|
"retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1) # select top result\n",
|
|
"\n",
|
|
"retriever.batch([\"cat\", \"shark\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a36d3f64-a8bc-4baa-b2ea-07e324a0143e",
|
|
"metadata": {},
|
|
"source": [
|
|
"Vectorstores implement an `as_retriever` method that will generate a Retriever, specifically a [VectorStoreRetriever](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.vectorstores.VectorStoreRetriever.html). These retrievers include specific `search_type` and `search_kwargs` attributes that identify what methods of the underlying vector store to call, and how to parameterize them. For instance, we can replicate the above with the following:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "4989fe5e-ac58-4751-bc35-f53ff885860c",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[[Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'source': 'mammal-pets-doc'})],\n",
|
|
" [Document(page_content='Goldfish are popular pets for beginners, requiring relatively simple care.', metadata={'source': 'fish-pets-doc'})]]"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"retriever = vectorstore.as_retriever(\n",
|
|
" search_type=\"similarity\",\n",
|
|
" search_kwargs={\"k\": 1},\n",
|
|
")\n",
|
|
"\n",
|
|
"retriever.batch([\"cat\", \"shark\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6b79ded3-39ed-4aeb-8b70-cd36795ae239",
|
|
"metadata": {},
|
|
"source": [
|
|
"`VectorStoreRetriever` supports search types of `\"similarity\"` (default), `\"mmr\"` (maximum marginal relevance, described above), and `\"similarity_score_threshold\"`. We can use the latter to threshold documents output by the retriever by similarity score.\n",
|
|
"\n",
|
|
"Retrievers can easily be incorporated into more complex applications, such as retrieval-augmented generation (RAG) applications that combine a given question with retrieved context into a prompt for a LLM. Below we show a minimal example.\n",
|
|
"\n",
|
|
"```{=mdx}\n",
|
|
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
|
|
"\n",
|
|
"<ChatModelTabs customVarName=\"llm\" />\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "c77b68bf-59f3-4416-9877-960f934c374d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# | output: false\n",
|
|
"# | echo: false\n",
|
|
"\n",
|
|
"from langchain_openai import ChatOpenAI\n",
|
|
"\n",
|
|
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "6f1ae0d0-0b4b-4da0-80ce-f82913052a83",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.prompts import ChatPromptTemplate\n",
|
|
"from langchain_core.runnables import RunnablePassthrough\n",
|
|
"\n",
|
|
"message = \"\"\"\n",
|
|
"Answer this question using the provided context only.\n",
|
|
"\n",
|
|
"{question}\n",
|
|
"\n",
|
|
"Context:\n",
|
|
"{context}\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"prompt = ChatPromptTemplate.from_messages([(\"human\", message)])\n",
|
|
"\n",
|
|
"rag_chain = {\"context\": retriever, \"question\": RunnablePassthrough()} | prompt | llm"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "b3c0d625-61e0-492e-b3a6-c40d383fca03",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Cats are independent pets that often enjoy their own space.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"response = rag_chain.invoke(\"tell me about cats\")\n",
|
|
"\n",
|
|
"print(response.content)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3d9be7cb-2081-48a4-b6e4-d5e2d562ffd4",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Learn more:\n",
|
|
"\n",
|
|
"Retrieval strategies can be rich and complex. For example:\n",
|
|
"\n",
|
|
"- We can [infer hard rules and filters](/docs/how_to/self_query/) from a query (e.g., \"using documents published after 2020\");\n",
|
|
"- We can [return documents that are linked](/docs/how_to/parent_document_retriever/) to the retrieved context in some way (e.g., via some document taxonomy);\n",
|
|
"- We can generate [multiple embeddings](/docs/how_to/multi_vector) for each unit of context;\n",
|
|
"- We can [ensemble results](/docs/how_to/ensemble_retriever) from multiple retrievers;\n",
|
|
"- We can assign weights to documents, e.g., to weigh [recent documents](/docs/how_to/time_weighted_vectorstore/) higher.\n",
|
|
"\n",
|
|
"The [retrievers](/docs/how_to#retrievers) section of the how-to guides covers these and other built-in retrieval strategies.\n",
|
|
"\n",
|
|
"It is also straightforward to extend the [BaseRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_core.retrievers.BaseRetriever.html) class in order to implement custom retrievers. See our how-to guide [here](/docs/how_to/custom_retriever)."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.4"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|