docs: update retriever integration pages (#24931)

This commit is contained in:
ccurme 2024-08-01 14:37:07 -04:00 committed by GitHub
parent ea505985c4
commit 41ed23a050
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 626 additions and 208 deletions

View File

@ -2,15 +2,39 @@
"cells": [
{
"cell_type": "markdown",
"id": "1edb9e6b",
"id": "f9a62e19-b00b-4f6c-a700-1e500e4c290a",
"metadata": {},
"source": [
"# Azure AI Search\n",
"---\n",
"sidebar_label: Azure AI Search\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "76f74245-7220-4446-ae8d-4e5a9e998f1f",
"metadata": {},
"source": [
"# AzureAISearchRetriever\n",
"\n",
"## Overview\n",
"[Azure AI Search](https://learn.microsoft.com/azure/search/search-what-is-azure-search) (formerly known as `Azure Cognitive Search`) is a Microsoft cloud search service that gives developers infrastructure, APIs, and tools for information retrieval of vector, keyword, and hybrid queries at scale.\n",
"\n",
"`AzureAISearchRetriever` is an integration module that returns documents from an unstructured query. It's based on the BaseRetriever class and it targets the 2023-11-01 stable REST API version of Azure AI Search, which means it supports vector indexing and queries.\n",
"\n",
"This guide will help you getting started with the Azure AI Search [retriever](/docs/concepts/#retrievers). For detailed documentation of all `AzureAISearchRetriever` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/retrievers/langchain_community.retrievers.azure_ai_search.AzureAISearchRetriever.html).\n",
"\n",
"`AzureAISearchRetriever` replaces `AzureCognitiveSearchRetriever`, which will soon be deprecated. We recommend switching to the newer version that's based on the most recent stable version of the search APIs.\n",
"\n",
"### Integration details\n",
"\n",
"| Retriever | Bring your own docs | Self-host | Cloud offering | Package |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"[AzureAISearchRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_community.retrievers.azure_ai_search.AzureAISearchRetriever.html) | ✅ | ❌ | ✅ | langchain_community.retrievers |\n",
"\n",
"\n",
"## Setup\n",
"\n",
"To use this module, you need:\n",
"\n",
"+ An Azure AI Search service. You can [create one](https://learn.microsoft.com/azure/search/search-create-service-portal) for free if you sign up for the Azure trial. A free service has lower quotas, but it's sufficient for running the code in this notebook.\n",
@ -19,7 +43,40 @@
"\n",
"+ An API key. API keys are generated when you create the search service. If you're just querying an index, you can use the query API key, otherwise use an admin API key. See [Find your API keys](https://learn.microsoft.com/azure/search/search-security-api-keys?tabs=rest-use%2Cportal-find%2Cportal-query#find-existing-keys) for details.\n",
"\n",
"`AzureAISearchRetriever` replaces `AzureCognitiveSearchRetriever`, which will soon be deprecated. We recommend switching to the newer version that's based on the most recent stable version of the search APIs."
"We can then set the search service name, index name, and API key as environment variables (alternatively, you can pass them as arguments to `AzureAISearchRetriever`). The search index provides the searchable content."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a56e83b-8563-4479-ab61-090fc79f5b00",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"AZURE_AI_SEARCH_SERVICE_NAME\"] = \"<YOUR_SEARCH_SERVICE_NAME>\"\n",
"os.environ[\"AZURE_AI_SEARCH_INDEX_NAME\"] = \"<YOUR_SEARCH_INDEX_NAME>\"\n",
"os.environ[\"AZURE_AI_SEARCH_API_KEY\"] = \"<YOUR_API_KEY>\""
]
},
{
"cell_type": "markdown",
"id": "3e635218-8634-4f39-abc5-39e319eeb136",
"metadata": {},
"source": [
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "88751b84-7cb7-4dd2-af35-c1e9b369d012",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
@ -27,9 +84,9 @@
"id": "f99d4456",
"metadata": {},
"source": [
"## Install packages\n",
"### Installation\n",
"\n",
"Use azure-documents-search package 11.4 or later."
"This retriever lives in the `langchain-community` package. We will need some additional dependencies as well:"
]
},
{
@ -39,9 +96,9 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain\n",
"%pip install --upgrade --quiet langchain-community\n",
"%pip install --upgrade --quiet langchain-openai\n",
"%pip install --upgrade --quiet azure-search-documents\n",
"%pip install --upgrade --quiet azure-search-documents>=11.4\n",
"%pip install --upgrade --quiet azure-identity"
]
},
@ -50,7 +107,9 @@
"id": "0474661d",
"metadata": {},
"source": [
"## Import required libraries"
"## Instantiation\n",
"\n",
"For `AzureAISearchRetriever`, provide an `index_name`, `content_key`, and `top_k` set to the number of number of results you'd like to retrieve. Setting `top_k` to zero (the default) returns all results."
]
},
{
@ -60,52 +119,8 @@
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain_community.retrievers import AzureAISearchRetriever\n",
"\n",
"from langchain_community.retrievers import (\n",
" AzureAISearchRetriever,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b7243e6d",
"metadata": {},
"source": [
"## Configure search settings\n",
"\n",
"Set the search service name, index name, and API key as environment variables (alternatively, you can pass them as arguments to `AzureAISearchRetriever`). The search index provides the searchable content. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33fd23d1",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"AZURE_AI_SEARCH_SERVICE_NAME\"] = \"<YOUR_SEARCH_SERVICE_NAME>\"\n",
"os.environ[\"AZURE_AI_SEARCH_INDEX_NAME\"] = \"<YOUR_SEARCH_INDEX_NAME>\"\n",
"os.environ[\"AZURE_AI_SEARCH_API_KEY\"] = \"<YOUR_API_KEY>\""
]
},
{
"cell_type": "markdown",
"id": "057deaad",
"metadata": {},
"source": [
"## Create the retriever\n",
"\n",
"For `AzureAISearchRetriever`, provide an `index_name`, `content_key`, and `top_k` set to the number of number of results you'd like to retrieve. Setting `top_k` to zero (the default) returns all results."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "c18d0c4c",
"metadata": {},
"outputs": [],
"source": [
"retriever = AzureAISearchRetriever(\n",
" content_key=\"content\", top_k=1, index_name=\"langchain-vector-demo\"\n",
")"
@ -116,6 +131,8 @@
"id": "e94ea104",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"Now you can use it to retrieve documents from Azure AI Search. \n",
"This is the method you would call to do so. It will return all documents relevant to the query. "
]
@ -259,6 +276,69 @@
"source": [
"retriever.invoke(\"does the president have a plan for covid-19?\")"
]
},
{
"cell_type": "markdown",
"id": "dd6c9ba9-978f-4e2c-9cc7-ccd1be58eafb",
"metadata": {},
"source": [
"## Use within a chain"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbcd8ac6-12ea-4c22-8a98-c24825d598d7",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"prompt = ChatPromptTemplate.from_template(\n",
" \"\"\"Answer the question based only on the context provided.\n",
"\n",
"Context: {context}\n",
"\n",
"Question: {question}\"\"\"\n",
")\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\")\n",
"\n",
"\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"\n",
"chain = (\n",
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db80f3c7-83e1-4965-8ff2-a3dd66a07f0e",
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\"does the president have a plan for covid-19?\")"
]
},
{
"cell_type": "markdown",
"id": "a3d6140e-c2a0-40b2-a141-cab61ab39185",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `AzureAISearchRetriever` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/retrievers/langchain_community.retrievers.azure_ai_search.AzureAISearchRetriever.html)."
]
}
],
"metadata": {
@ -277,7 +357,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@ -1,19 +1,86 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "b0872249-1af5-4d54-b816-1babad7a8c9e",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Bedrock (Knowledge Bases)\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "b6636c27-35da-4ba7-8313-eca21660cab3",
"metadata": {},
"source": [
"# Bedrock (Knowledge Bases)\n",
"# Bedrock (Knowledge Bases) Retriever\n",
"\n",
"> [Knowledge bases for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) is an Amazon Web Services (AWS) offering which lets you quickly build RAG applications by using your private data to customize FM response.\n",
"## Overview\n",
"\n",
"> Implementing `RAG` requires organizations to perform several cumbersome steps to convert data into embeddings (vectors), store the embeddings in a specialized vector database, and build custom integrations into the database to search and retrieve text relevant to the users query. This can be time-consuming and inefficient.\n",
"This guide will help you getting started with the AWS Knowledge Bases [retriever](/docs/concepts/#retrievers).\n",
"\n",
"> With `Knowledge Bases for Amazon Bedrock`, simply point to the location of your data in `Amazon S3`, and `Knowledge Bases for Amazon Bedrock` takes care of the entire ingestion workflow into your vector database. If you do not have an existing vector database, Amazon Bedrock creates an Amazon OpenSearch Serverless vector store for you. For retrievals, use the Langchain - Amazon Bedrock integration via the Retrieve API to retrieve relevant results for a user query from knowledge bases.\n",
"[Knowledge Bases for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) is an Amazon Web Services (AWS) offering which lets you quickly build RAG applications by using your private data to customize FM response.\n",
"\n",
"> Knowledge base can be configured through [AWS Console](https://aws.amazon.com/console/) or by using [AWS SDKs](https://aws.amazon.com/developer/tools/)."
"Implementing `RAG` requires organizations to perform several cumbersome steps to convert data into embeddings (vectors), store the embeddings in a specialized vector database, and build custom integrations into the database to search and retrieve text relevant to the users query. This can be time-consuming and inefficient.\n",
"\n",
"With `Knowledge Bases for Amazon Bedrock`, simply point to the location of your data in `Amazon S3`, and `Knowledge Bases for Amazon Bedrock` takes care of the entire ingestion workflow into your vector database. If you do not have an existing vector database, Amazon Bedrock creates an Amazon OpenSearch Serverless vector store for you. For retrievals, use the Langchain - Amazon Bedrock integration via the Retrieve API to retrieve relevant results for a user query from knowledge bases.\n",
"\n",
"### Integration details\n",
"\n",
"| Retriever | Bring your own docs | Self-host | Cloud offering | Package |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"[AmazonKnowledgeBasesRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html) | ✅ | ❌ | ✅ | langchain_aws.retrievers |\n"
]
},
{
"cell_type": "markdown",
"id": "cd092536-61bd-4b3f-9050-076daccc9e72",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Knowledge Bases can be configured through [AWS Console](https://aws.amazon.com/console/) or by using [AWS SDKs](https://aws.amazon.com/developer/tools/). We will need the `knowledge_base_id` to instantiate the retriever."
]
},
{
"cell_type": "markdown",
"id": "238c0ceb-d4b6-409e-bed9-d30143d2f2c9",
"metadata": {},
"source": [
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4426098-820c-48dc-9826-056a91bebe9e",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"id": "4ede6277-ea56-45f6-8ef4-fe14734ee279",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"This retriever lives in the `langchain-aws` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4db1af24-0969-43bd-8438-af5e3024b0d0",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-aws"
]
},
{
@ -21,17 +88,9 @@
"id": "b34c8cbe-c6e5-4398-adf1-4925204bcaed",
"metadata": {},
"source": [
"## Using the Knowledge Bases Retriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "26c97d36-911c-4fe0-a478-546192728f30",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet boto3"
"## Instantiation\n",
"\n",
"Now we can instantiate our retriever:"
]
},
{
@ -41,7 +100,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.retrievers import AmazonKnowledgeBasesRetriever\n",
"from langchain_aws.retrievers import AmazonKnowledgeBasesRetriever\n",
"\n",
"retriever = AmazonKnowledgeBasesRetriever(\n",
" knowledge_base_id=\"PUIJP4EQUA\",\n",
@ -49,6 +108,14 @@
")"
]
},
{
"cell_type": "markdown",
"id": "9dff39f8-b6ba-41bf-b95b-d345928ed07d",
"metadata": {},
"source": [
"## Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -66,7 +133,7 @@
"id": "7de9b61b-597b-4aba-95fb-49d11e84510e",
"metadata": {},
"source": [
"### Using in a QA Chain"
"## Use within a chain"
]
},
{
@ -78,7 +145,7 @@
"source": [
"from botocore.client import Config\n",
"from langchain.chains import RetrievalQA\n",
"from langchain_community.llms import Bedrock\n",
"from langchain_aws import Bedrock\n",
"\n",
"model_kwargs_claude = {\"temperature\": 0, \"top_k\": 10, \"max_tokens_to_sample\": 3000}\n",
"\n",
@ -90,6 +157,16 @@
"\n",
"qa(query)"
]
},
{
"cell_type": "markdown",
"id": "22e2538a-e042-4997-bb81-b68ecb27d665",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `AmazonKnowledgeBasesRetriever` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html)."
]
}
],
"metadata": {
@ -108,7 +185,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@ -2,14 +2,72 @@
"cells": [
{
"cell_type": "markdown",
"id": "ab66dd43",
"id": "41ccce84-f6d9-4ba0-8281-22cbf29f20d3",
"metadata": {},
"source": [
"# Elasticsearch\n",
"---\n",
"sidebar_label: Elasticsearch\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "54c4d916-05db-4e01-9893-c711904205b3",
"metadata": {},
"source": [
"# ElasticsearchRetriever\n",
"\n",
"## Overview\n",
">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It supports keyword search, vector search, hybrid search and complex filtering.\n",
"\n",
"The `ElasticsearchRetriever` is a generic wrapper to enable flexible access to all `Elasticsearch` features through the [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html). For most use cases the other classes (`ElasticsearchStore`, `ElasticsearchEmbeddings`, etc.) should suffice, but if they don't you can use `ElasticsearchRetriever`."
"The `ElasticsearchRetriever` is a generic wrapper to enable flexible access to all `Elasticsearch` features through the [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html). For most use cases the other classes (`ElasticsearchStore`, `ElasticsearchEmbeddings`, etc.) should suffice, but if they don't you can use `ElasticsearchRetriever`.\n",
"\n",
"This guide will help you getting started with the Elasticsearch [retriever](/docs/concepts/#retrievers). For detailed documentation of all `ElasticsearchRetriever` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/retrievers/langchain_elasticsearch.retrievers.ElasticsearchRetriever.html).\n",
"\n",
"### Integration details\n",
"\n",
"| Retriever | Bring your own docs | Self-host | Cloud offering | Package |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"[ElasticsearchRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_elasticsearch.retrievers.ElasticsearchRetriever.html) | ✅ | ✅ | ✅ | langchain_elasticsearch |\n",
"\n",
"\n",
"## Setup\n",
"\n",
"There are two main ways to set up an Elasticsearch instance:\n",
"\n",
"- Elastic Cloud: [Elastic Cloud](https://cloud.elastic.co/) is a managed Elasticsearch service. Sign up for a [free trial](https://www.elastic.co/cloud/cloud-trial-overview).\n",
"To connect to an Elasticsearch instance that does not require login credentials (starting the docker instance with security enabled), pass the Elasticsearch URL and index name along with the embedding object to the constructor.\n",
"\n",
"- Local Install Elasticsearch: Get started with Elasticsearch by running it locally. The easiest way is to use the official Elasticsearch Docker image. See the [Elasticsearch Docker documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for more information."
]
},
{
"cell_type": "markdown",
"id": "e13a7b58-3a56-4ce6-a4d5-81a8dd2080df",
"metadata": {},
"source": [
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "492b81d0-c85b-4693-ae4f-3f33da571ddd",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"id": "78335745-f14d-411d-9c06-64ff83eb9358",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"This retriever lives in the `langchain-elasticsearch` package. For demonstration purposes, we will also install `langchain-community` to generate text [embeddings](/docs/concepts/#embedding-models)."
]
},
{
@ -21,7 +79,7 @@
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet elasticsearch langchain-elasticsearch"
"%pip install -qU langchain-community langchain-elasticsearch"
]
},
{
@ -48,7 +106,7 @@
"id": "24c0d140",
"metadata": {},
"source": [
"## Configure\n",
"### Configure\n",
"\n",
"Here we define the conncection to Elasticsearch. In this example we use a locally running instance. Alternatively, you can make an account in [Elastic Cloud](https://cloud.elastic.co/) and start a [free trial](https://www.elastic.co/cloud/cloud-trial-overview)."
]
@ -70,7 +128,7 @@
"id": "60aa7c20",
"metadata": {},
"source": [
"For vector search, we are going to use random embeddings just for illustration. For real use cases, pick one of the available LangChain `Embeddings` classes."
"For vector search, we are going to use random embeddings just for illustration. For real use cases, pick one of the available LangChain [Embeddings](/docs/integrations/text_embedding) classes."
]
},
{
@ -88,7 +146,7 @@
"id": "b4eea654",
"metadata": {},
"source": [
"## Define example data"
"#### Define example data"
]
},
{
@ -118,7 +176,7 @@
"id": "1c518c42",
"metadata": {},
"source": [
"## Index data\n",
"#### Index data\n",
"\n",
"Typically, users make use of `ElasticsearchRetriever` when they already have data in an Elasticsearch index. Here we index some example text documents. If you created an index for example using `ElasticsearchStore.from_documents` that's also fine."
]
@ -209,14 +267,8 @@
"id": "08437fa2",
"metadata": {},
"source": [
"## Usage examples"
]
},
{
"cell_type": "markdown",
"id": "469aa295",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"### Vector search\n",
"\n",
"Dense vector retrival using fake embeddings in this example."
@ -543,6 +595,91 @@
"\n",
"custom_mapped_retriever.invoke(\"foo\")"
]
},
{
"cell_type": "markdown",
"id": "1663feff-4527-4fb0-9395-b28af5c9ec99",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"Following the above examples, we use `.invoke` to issue a single query. Because retrievers are Runnables, we can use any method in the [Runnable interface](/docs/concepts/#runnable-interface), such as `.batch`, as well."
]
},
{
"cell_type": "markdown",
"id": "f4f946ed-ff3a-43d7-9e0d-7983ff13c868",
"metadata": {},
"source": [
"## Use within a chain\n",
"\n",
"We can also incorporate retrievers into [chains](/docs/how_to/sequence/) to build larger applications, such as a simple [RAG](/docs/tutorials/rag/) application. For demonstration purposes, we instantiate an OpenAI chat model as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19302ef1-dd49-4f9c-8d87-4ea23b8296e2",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-openai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "832857a7-3b16-4a85-acc7-28efe6ebdae8",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"prompt = ChatPromptTemplate.from_template(\n",
" \"\"\"Answer the question based only on the context provided.\n",
"\n",
"Context: {context}\n",
"\n",
"Question: {question}\"\"\"\n",
")\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\")\n",
"\n",
"\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"\n",
"chain = (\n",
" {\"context\": vector_retriever | format_docs, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7317942b-7c9a-477d-ba11-3421da804a22",
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\"what is foo?\")"
]
},
{
"cell_type": "markdown",
"id": "eeb49714-ba5a-4b10-8e58-67d061a486d1",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `ElasticsearchRetriever` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/retrievers/langchain_elasticsearch.retrievers.ElasticsearchRetriever.html)."
]
}
],
"metadata": {
@ -561,7 +698,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@ -1,27 +1,44 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Google Vertex AI Search\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Vertex AI Search\n",
"\n",
"## Overview\n",
"\n",
">[Google Vertex AI Search](https://cloud.google.com/enterprise-search) (formerly known as `Enterprise Search` on `Generative AI App Builder`) is a part of the [Vertex AI](https://cloud.google.com/vertex-ai) machine learning platform offered by `Google Cloud`.\n",
">\n",
">`Vertex AI Search` lets organizations quickly build generative AI-powered search engines for customers and employees. It's underpinned by a variety of `Google Search` technologies, including semantic search, which helps deliver more relevant results than traditional keyword-based search techniques by using natural language processing and machine learning techniques to infer relationships within the content and intent from the users query input. Vertex AI Search also benefits from Googles expertise in understanding how users search and factors in content relevance to order displayed results.\n",
"\n",
">`Vertex AI Search` is available in the `Google Cloud Console` and via an API for enterprise workflow integration.\n",
"\n",
"This notebook demonstrates how to configure `Vertex AI Search` and use the Vertex AI Search retriever. The Vertex AI Search retriever encapsulates the [Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install pre-requisites\n",
"This notebook demonstrates how to configure `Vertex AI Search` and use the Vertex AI Search [retriever](/docs/concepts/#retrievers). The Vertex AI Search retriever encapsulates the [Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service).\n",
"\n",
"You need to install the `google-cloud-discoveryengine` package to use the Vertex AI Search retriever.\n"
"For detailed documentation of all `VertexAISearchRetriever` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/vertex_ai_search/langchain_google_community.vertex_ai_search.VertexAISearchRetriever.html).\n",
"\n",
"### Integration details\n",
"\n",
"| Retriever | Bring your own docs | Self-host | Cloud offering | Package |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"[VertexAISearchRetriever](https://api.python.langchain.com/en/latest/vertex_ai_search/langchain_google_community.vertex_ai_search.VertexAISearchRetriever.html) | ✅ | ❌ | ✅ | langchain_google_community.vertex_ai_search |\n",
"\n",
"\n",
"## Setup\n",
"\n",
"### Installation\n",
"\n",
"You need to install the `langchain-google-community` and `google-cloud-discoveryengine` packages to use the Vertex AI Search retriever."
]
},
{
@ -30,14 +47,14 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet google-cloud-discoveryengine"
"%pip install -qU langchain-google-community google-cloud-discoveryengine"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure access to Google Cloud and Vertex AI Search\n",
"### Configure access to Google Cloud and Vertex AI Search\n",
"\n",
"Vertex AI Search is generally available without allowlist as of August 2023.\n",
"\n",
@ -48,7 +65,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a search engine and populate an unstructured data store\n",
"#### Create a search engine and populate an unstructured data store\n",
"\n",
"- Follow the instructions in the [Vertex AI Search Getting Started guide](https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search) to set up a Google Cloud project and Vertex AI Search.\n",
"- [Use the Google Cloud Console to create an unstructured data store](https://cloud.google.com/generative-ai-app-builder/docs/create-engine-es#unstructured-data)\n",
@ -60,7 +77,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set credentials to access Vertex AI Search API\n",
"#### Set credentials to access Vertex AI Search API\n",
"\n",
"The [Vertex AI Search client libraries](https://cloud.google.com/generative-ai-app-builder/docs/libraries) used by the Vertex AI Search retriever provide high-level language support for authenticating to Google Cloud programmatically.\n",
"Client libraries support [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials); the libraries look for credentials in a set of defined locations and use those credentials to authenticate requests to the API.\n",
@ -87,16 +104,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure and use the Vertex AI Search retriever\n",
"### Configure and use the Vertex AI Search retriever\n",
"\n",
"The Vertex AI Search retriever is implemented in the `langchain.retriever.GoogleVertexAISearchRetriever` class. The `get_relevant_documents` method returns a list of `langchain.schema.Document` documents where the `page_content` field of each document is populated the document content.\n",
"The Vertex AI Search retriever is implemented in the `langchain_google_community.VertexAISearchRetriever` class. The `get_relevant_documents` method returns a list of `langchain.schema.Document` documents where the `page_content` field of each document is populated the document content.\n",
"Depending on the data type used in Vertex AI Search (website, structured or unstructured) the `page_content` field is populated as follows:\n",
"\n",
"- Website with advanced indexing: an `extractive answer` that matches a query. The `metadata` field is populated with metadata (if any) of the document from which the segments or answers were extracted.\n",
"- Unstructured data source: either an `extractive segment` or an `extractive answer` that matches a query. The `metadata` field is populated with metadata (if any) of the document from which the segments or answers were extracted.\n",
"- Structured data source: a string json containing all the fields returned from the structured data source. The `metadata` field is populated with metadata (if any) of the document\n",
"\n",
"### Extractive answers & extractive segments\n",
"#### Extractive answers & extractive segments\n",
"\n",
"An extractive answer is verbatim text that is returned with each search result. It is extracted directly from the original document. Extractive answers are typically displayed near the top of web pages to provide an end user with a brief answer that is contextually relevant to their query. Extractive answers are available for website and unstructured search.\n",
"\n",
@ -108,7 +125,7 @@
"\n",
"When creating an instance of the retriever you can specify a number of parameters that control which data store to access and how a natural language query is processed, including configurations for extractive answers and segments.\n",
"\n",
"### The mandatory parameters are:\n",
"#### The mandatory parameters are:\n",
"\n",
"- `project_id` - Your Google Cloud Project ID.\n",
"- `location_id` - The location of the data store.\n",
@ -148,15 +165,15 @@
"\n",
"To update to the new retriever, make the following changes:\n",
"\n",
"- Change the import from: `from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever` -> `from langchain.retrievers import GoogleVertexAISearchRetriever`.\n",
"- Change all class references from `GoogleCloudEnterpriseSearchRetriever` -> `GoogleVertexAISearchRetriever`.\n"
"- Change the import from: `from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever` -> `from langchain_google_community import VertexAISearchRetriever`.\n",
"- Change all class references from `GoogleCloudEnterpriseSearchRetriever` -> `VertexAISearchRetriever`.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure and use the retriever for **unstructured** data with extractive segments\n"
"Note: When using the retriever, if you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
@ -165,9 +182,28 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.retrievers import (\n",
" GoogleVertexAIMultiTurnSearchRetriever,\n",
" GoogleVertexAISearchRetriever,\n",
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"### Configure and use the retriever for **unstructured** data with extractive segments"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_community import (\n",
" VertexAIMultiTurnSearchRetriever,\n",
" VertexAISearchRetriever,\n",
")\n",
"\n",
"PROJECT_ID = \"<YOUR PROJECT ID>\" # Set to your Project ID\n",
@ -182,7 +218,7 @@
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAISearchRetriever(\n",
"retriever = VertexAISearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" location_id=LOCATION_ID,\n",
" data_store_id=DATA_STORE_ID,\n",
@ -216,7 +252,7 @@
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAISearchRetriever(\n",
"retriever = VertexAISearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" location_id=LOCATION_ID,\n",
" data_store_id=DATA_STORE_ID,\n",
@ -243,7 +279,7 @@
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAISearchRetriever(\n",
"retriever = VertexAISearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" location_id=LOCATION_ID,\n",
" data_store_id=DATA_STORE_ID,\n",
@ -269,7 +305,7 @@
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAISearchRetriever(\n",
"retriever = VertexAISearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" location_id=LOCATION_ID,\n",
" data_store_id=DATA_STORE_ID,\n",
@ -297,7 +333,7 @@
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAISearchRetriever(\n",
"retriever = VertexAISearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" location_id=LOCATION_ID,\n",
" search_engine_id=SEARCH_ENGINE_ID,\n",
@ -325,7 +361,7 @@
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAIMultiTurnSearchRetriever(\n",
"retriever = VertexAIMultiTurnSearchRetriever(\n",
" project_id=PROJECT_ID, location_id=LOCATION_ID, data_store_id=DATA_STORE_ID\n",
")\n",
"\n",
@ -333,6 +369,85 @@
"for doc in result:\n",
" print(doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"Following the above examples, we use `.invoke` to issue a single query. Because retrievers are Runnables, we can use any method in the [Runnable interface](/docs/concepts/#runnable-interface), such as `.batch`, as well."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use within a chain\n",
"\n",
"We can also incorporate retrievers into [chains](/docs/how_to/sequence/) to build larger applications, such as a simple [RAG](/docs/tutorials/rag/) application. For demonstration purposes, we instantiate a VertexAI chat model as well. See the corresponding Vertex [integration docs](/docs/integrations/chat/google_vertex_ai_palm/) for setup instructions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-google-vertexai"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_google_vertexai import ChatVertexAI\n",
"\n",
"prompt = ChatPromptTemplate.from_template(\n",
" \"\"\"Answer the question based only on the context provided.\n",
"\n",
"Context: {context}\n",
"\n",
"Question: {question}\"\"\"\n",
")\n",
"\n",
"llm = ChatVertexAI(model_name=\"chat-bison\", temperature=0)\n",
"\n",
"\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"\n",
"chain = (\n",
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `VertexAISearchRetriever` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/vertex_ai_search/langchain_google_community.vertex_ai_search.VertexAISearchRetriever.html)."
]
}
],
"metadata": {
@ -351,7 +466,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@ -5,23 +5,25 @@ sidebar_class_name: hidden
# Retrievers
A **retriever** is an interface that returns documents given an unstructured query.
A [retriever](/docs/concepts/#retrievers) is an interface that returns documents given an unstructured query.
It is more general than a vector store.
A retriever does not need to be able to store documents, only to return (or retrieve) them.
Retrievers can be created from vector stores, but are also broad enough to include [Wikipedia search](/docs/integrations/retrievers/wikipedia/) and [Amazon Kendra](/docs/integrations/retrievers/amazon_kendra_retriever/).
Retrievers accept a string query as input and return a list of Document's as output.
Retrievers accept a string query as input and return a list of [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) as output.
For specifics on how to use retrievers, see the [relevant how-to guides here](/docs/how_to/#retrievers).
This table lists common retrievers.
Note that all [vector stores](/docs/concepts/#vector-stores) can be [cast to retrievers](/docs/how_to/vectorstore_retriever/).
Refer to the vector store [integration docs](/docs/integrations/vectorstores/) for available vector stores.
This table lists custom retrievers, implemented via subclassing [BaseRetriever](/docs/how_to/custom_retriever/).
| Retriever | Namespace | Native async | Local |
|-----------|-----------|---------------|------|
| [AmazonKnowledgeBasesRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html) | langchain_aws.retrievers | ❌ | ❌ |
| [AzureAISearchRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_community.retrievers.azure_ai_search.AzureAISearchRetriever.html) | langchain_community.retrievers | ✅ | ❌ |
| [ElasticsearchRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_elasticsearch.retrievers.ElasticsearchRetriever.html) | langchain_elasticsearch | ❌ | ❌ |
| [MilvusCollectionHybridSearchRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html) | langchain_milvus | ❌ | ❌ |
| [TavilySearchAPIRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_community.retrievers.tavily_search_api.TavilySearchAPIRetriever.html) | langchain_community.retrievers | ❌ | ❌ |
| [VertexAISearchRetriever](https://api.python.langchain.com/en/latest/vertex_ai_search/langchain_google_community.vertex_ai_search.VertexAISearchRetriever.html) | langchain_google_community.vertex_ai_search | ❌ | ❌ |
| Retriever | Bring your own docs | Self-host | Cloud offering | Package |
|-----------|---------------------|-----------|----------------|---------|
| [AmazonKnowledgeBasesRetriever](/docs/integrations/retrievers/bedrock) | ✅ | ❌ | ✅ | [langchain_aws](https://api.python.langchain.com/en/latest/retrievers/langchain_aws.retrievers.bedrock.AmazonKnowledgeBasesRetriever.html) |
| [AzureAISearchRetriever](/docs/integrations/retrievers/azure_ai_search) | ✅ | ❌ | ✅ | [langchain_community](https://api.python.langchain.com/en/latest/retrievers/langchain_community.retrievers.azure_ai_search.AzureAISearchRetriever.html) |
| [ElasticsearchRetriever](/docs/integrations/retrievers/elasticsearch_retriever) | ✅ | ✅ | ✅ | [langchain_elasticsearch](https://api.python.langchain.com/en/latest/retrievers/langchain_elasticsearch.retrievers.ElasticsearchRetriever.html) |
| [MilvusCollectionHybridSearchRetriever](/docs/integrations/retrievers/milvus_hybrid_search) | ✅ | ❌ | ✅ | [langchain_milvus](https://api.python.langchain.com/en/latest/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html) |
| [TavilySearchAPIRetriever](/docs/integrations/retrievers/tavily) | ❌ | ❌ | ❌ | [langchain_community](https://api.python.langchain.com/en/latest/retrievers/langchain_community.retrievers.tavily_search_api.TavilySearchAPIRetriever.html) |
| [VertexAISearchRetriever](/docs/integrations/retrievers/google_vertex_ai_search) | ✅ | ❌ | ✅ | [langchain_google_community](https://api.python.langchain.com/en/latest/vertex_ai_search/langchain_google_community.vertex_ai_search.VertexAISearchRetriever.html) |

View File

@ -2,21 +2,48 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"metadata": {},
"source": [
"# Milvus Hybrid Search\n",
"---\n",
"sidebar_label: Milvus Hybrid Search\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Milvus Hybrid Search Retriever\n",
"\n",
"## Overview\n",
"\n",
"> [Milvus](https://milvus.io/docs) is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.\n",
"\n",
"This notebook goes over how to use the Milvus Hybrid Search retriever, which combines the strengths of both dense and sparse vector search.\n",
"This will help you getting started with the Milvus Hybrid Search [retriever](/docs/concepts/#retrievers), which combines the strengths of both dense and sparse vector search. For detailed documentation of all `MilvusCollectionHybridSearchRetriever` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html).\n",
"\n",
"For more reference please go to [Milvus Multi-Vector Search](https://milvus.io/docs/multi-vector-search.md)\n",
"\n"
"See also the Milvus Multi-Vector Search [docs](https://milvus.io/docs/multi-vector-search.md).\n",
"\n",
"### Integration details\n",
"\n",
"| Retriever | Bring your own docs | Self-host | Cloud offering | Package |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"[MilvusCollectionHybridSearchRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html) | ✅ | ❌ | ✅ | langchain_milvus |\n",
"\n",
"\n",
"\n",
"## Setup\n",
"\n",
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
@ -28,9 +55,9 @@
}
},
"source": [
"## Prerequisites\n",
"### Install dependencies\n",
"You need to prepare to install the following dependencies\n"
"### Installation\n",
"\n",
"This retriever lives in the `langchain-milvus` package. This guide requires the following dependencies:"
]
},
{
@ -50,32 +77,18 @@
"%pip install --upgrade --quiet pymilvus[model] langchain-milvus langchain-openai"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"Import necessary modules and classes"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import PromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_milvus.retrievers import MilvusCollectionHybridSearchRetriever\n",
"from langchain_milvus.utils.sparse import BM25SparseEmbedding\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
"from pymilvus import (\n",
" Collection,\n",
" CollectionSchema,\n",
@ -86,34 +99,15 @@
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import PromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_milvus.retrievers import MilvusCollectionHybridSearchRetriever\n",
"from langchain_milvus.utils.sparse import BM25SparseEmbedding\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"metadata": {},
"source": [
"### Start the Milvus service\n",
"\n",
"Please refer to the [Milvus documentation](https://milvus.io/docs/install_standalone-docker.md) to start the Milvus service.\n",
"\n",
"After starting milvus, you need to specify your milvus connection URI.\n"
"After starting milvus, you need to specify your milvus connection URI."
]
},
{
@ -155,8 +149,6 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Prepare data and Load\n",
"### Prepare dense and sparse embedding functions\n",
"\n",
"Let us fictionalize 10 fake descriptions of novels. In actual production, it may be a large amount of text data."
@ -379,15 +371,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build RAG chain with Retriever\n",
"### Create the Retriever\n",
"## Instantiation\n",
"\n",
"Define search parameters for sparse and dense fields, and create a retriever"
"Now we can instantiate our retriever, defining search parameters for sparse and dense fields:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -416,6 +407,13 @@
"In the input parameters of this Retriever, we use a dense embedding and a sparse embedding to perform hybrid search on the two fields of this Collection, and use WeightedRanker for reranking. Finally, 3 top-K Documents will be returned."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage"
]
},
{
"cell_type": "code",
"execution_count": 14,
@ -442,7 +440,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Build the RAG chain\n",
"## Use within a chain\n",
"\n",
"Initialize ChatOpenAI and define a prompt template"
]
@ -610,6 +608,15 @@
"source": [
"collection.drop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `MilvusCollectionHybridSearchRetriever` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html)."
]
}
],
"metadata": {
@ -628,7 +635,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@ -22,9 +22,9 @@
"\n",
"### Integration details\n",
"\n",
"| Retriever | Namespace | Native async | Local |\n",
"| :--- | :--- | :---: | :---: |\n",
"[TavilySearchAPIRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_community.retrievers.tavily_search_api.TavilySearchAPIRetriever.html) | langchain_community.retrievers | ❌ | ❌ |\n",
"| Retriever | Bring your own docs | Self-host | Cloud offering | Package |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"[TavilySearchAPIRetriever](https://api.python.langchain.com/en/latest/retrievers/langchain_community.retrievers.tavily_search_api.TavilySearchAPIRetriever.html) | ❌ | ❌ | ❌ | langchain_community.retrievers |\n",
"\n",
"## Setup"
]
@ -33,7 +33,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{

View File

@ -245,8 +245,8 @@ module.exports = {
},
],
link: {
type: "generated-index",
slug: "integrations/retrievers",
type: "doc",
id: "integrations/retrievers/index",
},
},
{

View File

@ -24,9 +24,9 @@
"\n",
"### Integration details\n",
"\n",
"| Retriever | Namespace | Native async | Local |\n",
"| :--- | :--- | :---: | :---: |\n",
"[__ModuleName__Retriever](https://api.python.langchain.com/en/latest/retrievers/__package_name__.retrievers.__module_name__.__ModuleName__Retriever.html) | __package_name__.retrievers | ❌ | ❌ |\n",
"| Retriever | Bring your own docs | Self-host | Cloud offering | Package |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"[__ModuleName__Retriever](https://api.python.langchain.com/en/latest/retrievers/__package_name__.retrievers.__module_name__.__ModuleName__Retriever.html) | ❌ | ❌ | ❌ | __package_name__ |\n",
"\n",
"\n",
"## Setup\n",
@ -39,7 +39,7 @@
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{