mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-11 18:16:12 +00:00
``` https://api\.python\.langchain\.com/en/latest/([^/]*)/langchain_([^.]*)\.(.*)\.html([^"]*) https://python.langchain.com/v0.2/api_reference/$2/$1/langchain_$2.$3.html$4 ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
225 lines
6.2 KiB
Plaintext
225 lines
6.2 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "raw",
|
|
"id": "ee14951b",
|
|
"metadata": {},
|
|
"source": [
|
|
"---\n",
|
|
"sidebar_position: 0\n",
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "105cddce",
|
|
"metadata": {},
|
|
"source": [
|
|
"# How to use a vectorstore as a retriever\n",
|
|
"\n",
|
|
"A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the vector store class to make it conform to the retriever interface.\n",
|
|
"It uses the search methods implemented by a vector store, like similarity search and MMR, to query the texts in the vector store.\n",
|
|
"\n",
|
|
"In this guide we will cover:\n",
|
|
"\n",
|
|
"1. How to instantiate a retriever from a vectorstore;\n",
|
|
"2. How to specify the search type for the retriever;\n",
|
|
"3. How to specify additional search parameters, such as threshold scores and top-k.\n",
|
|
"\n",
|
|
"## Creating a retriever from a vectorstore\n",
|
|
"\n",
|
|
"You can build a retriever from a vectorstore using its [.as_retriever](https://python.langchain.com/v0.2/api_reference/core/vectorstores/langchain_core.vectorstores.VectorStore.html#langchain_core.vectorstores.VectorStore.as_retriever) method. Let's walk through an example.\n",
|
|
"\n",
|
|
"First we instantiate a vectorstore. We will use an in-memory [FAISS](https://python.langchain.com/v0.2/api_reference/community/vectorstores/langchain_community.vectorstores.faiss.FAISS.html) vectorstore:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "174e3c69",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.document_loaders import TextLoader\n",
|
|
"from langchain_community.vectorstores import FAISS\n",
|
|
"from langchain_openai import OpenAIEmbeddings\n",
|
|
"from langchain_text_splitters import CharacterTextSplitter\n",
|
|
"\n",
|
|
"loader = TextLoader(\"state_of_the_union.txt\")\n",
|
|
"\n",
|
|
"documents = loader.load()\n",
|
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
|
"texts = text_splitter.split_documents(documents)\n",
|
|
"embeddings = OpenAIEmbeddings()\n",
|
|
"vectorstore = FAISS.from_documents(texts, embeddings)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6f6e65a1-5eb4-4165-b06b-9bb40624a8d8",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can then instantiate a retriever:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "52df5f55",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = vectorstore.as_retriever()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "08f8b820-5912-49c1-9d76-40be0571dffb",
|
|
"metadata": {},
|
|
"source": [
|
|
"This creates a retriever (specifically a [VectorStoreRetriever](https://python.langchain.com/v0.2/api_reference/core/vectorstores/langchain_core.vectorstores.VectorStoreRetriever.html)), which we can use in the usual way:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "32334fda",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"docs = retriever.invoke(\"what did the president say about ketanji brown jackson?\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fd7b19f0",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Maximum marginal relevance retrieval\n",
|
|
"By default, the vector store retriever uses similarity search. If the underlying vector store supports maximum marginal relevance search, you can specify that as the search type.\n",
|
|
"\n",
|
|
"This effectively specifies what method on the underlying vectorstore is used (e.g., `similarity_search`, `max_marginal_relevance_search`, etc.)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "b286ac04",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = vectorstore.as_retriever(search_type=\"mmr\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "07f937f7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"docs = retriever.invoke(\"what did the president say about ketanji brown jackson?\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6ce77789",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Passing search parameters\n",
|
|
"\n",
|
|
"We can pass parameters to the underlying vectorstore's search methods using `search_kwargs`.\n",
|
|
"\n",
|
|
"### Similarity score threshold retrieval\n",
|
|
"\n",
|
|
"For example, we can set a similarity score threshold and only return documents with a score above that threshold."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "dbb38a03",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = vectorstore.as_retriever(\n",
|
|
" search_type=\"similarity_score_threshold\", search_kwargs={\"score_threshold\": 0.5}\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "56f6c9ae",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"docs = retriever.invoke(\"what did the president say about ketanji brown jackson?\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "329f5b26",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Specifying top k\n",
|
|
"\n",
|
|
"We can also limit the number of documents `k` returned by the retriever."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "d712c91d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = vectorstore.as_retriever(search_kwargs={\"k\": 1})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "a79b573b",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"1"
|
|
]
|
|
},
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"docs = retriever.invoke(\"what did the president say about ketanji brown jackson?\")\n",
|
|
"len(docs)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.4"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|