docs: remove duplicated and inaccurate mulvus doc (part of langchain-ai#31104) (#31154)

This commit is contained in:
Michael Li 2025-05-11 05:38:11 +10:00 committed by GitHub
parent 23ec06b481
commit 0ef4ac75b7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 24 additions and 670 deletions

View File

@ -21,13 +21,3 @@ To import this vectorstore:
from langchain_milvus import Milvus
```
## Retrievers
See a [usage example](/docs/integrations/retrievers/milvus_hybrid_search).
To import this vectorstore:
```python
from langchain_milvus.retrievers import MilvusCollectionHybridSearchRetriever
from langchain_milvus.utils.sparse import BM25SparseEmbedding
```

View File

@ -1,639 +0,0 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Milvus Hybrid Search\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Milvus Hybrid Search Retriever\n",
"\n",
"> [Milvus](https://milvus.io/docs) is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.\n",
"\n",
"This will help you getting started with the Milvus Hybrid Search [retriever](/docs/concepts/retrievers), which combines the strengths of both dense and sparse vector search. For detailed documentation of all `MilvusCollectionHybridSearchRetriever` features and configurations head to the [API reference](https://python.langchain.com/api_reference/milvus/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html).\n",
"\n",
"See also the Milvus Multi-Vector Search [docs](https://milvus.io/docs/multi-vector-search.md).\n",
"\n",
"### Integration details\n",
"\n",
"import {ItemTable} from \"@theme/FeatureTables\";\n",
"\n",
"<ItemTable category=\"document_retrievers\" item=\"MilvusCollectionHybridSearchRetriever\" />\n",
"\n",
"## Setup\n",
"\n",
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Installation\n",
"\n",
"This retriever lives in the `langchain-milvus` package. This guide requires the following dependencies:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet pymilvus[model] langchain-milvus langchain-openai"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import PromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_milvus.retrievers import MilvusCollectionHybridSearchRetriever\n",
"from langchain_milvus.utils.sparse import BM25SparseEmbedding\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
"from pymilvus import (\n",
" Collection,\n",
" CollectionSchema,\n",
" DataType,\n",
" FieldSchema,\n",
" WeightedRanker,\n",
" connections,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Start the Milvus service\n",
"\n",
"Please refer to the [Milvus documentation](https://milvus.io/docs/install_standalone-docker.md) to start the Milvus service.\n",
"\n",
"After starting milvus, you need to specify your milvus connection URI."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"CONNECTION_URI = \"http://localhost:19530\""
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Prepare OpenAI API Key\n",
"\n",
"Please refer to the [OpenAI documentation](https://platform.openai.com/account/api-keys) to obtain your OpenAI API key, and set it as an environment variable.\n",
"\n",
"```shell\n",
"export OPENAI_API_KEY=<your_api_key>\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare dense and sparse embedding functions\n",
"\n",
"Let us fictionalize 10 fake descriptions of novels. In actual production, it may be a large amount of text data."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"texts = [\n",
" \"In 'The Whispering Walls' by Ava Moreno, a young journalist named Sophia uncovers a decades-old conspiracy hidden within the crumbling walls of an ancient mansion, where the whispers of the past threaten to destroy her own sanity.\",\n",
" \"In 'The Last Refuge' by Ethan Blackwood, a group of survivors must band together to escape a post-apocalyptic wasteland, where the last remnants of humanity cling to life in a desperate bid for survival.\",\n",
" \"In 'The Memory Thief' by Lila Rose, a charismatic thief with the ability to steal and manipulate memories is hired by a mysterious client to pull off a daring heist, but soon finds themselves trapped in a web of deceit and betrayal.\",\n",
" \"In 'The City of Echoes' by Julian Saint Clair, a brilliant detective must navigate a labyrinthine metropolis where time is currency, and the rich can live forever, but at a terrible cost to the poor.\",\n",
" \"In 'The Starlight Serenade' by Ruby Flynn, a shy astronomer discovers a mysterious melody emanating from a distant star, which leads her on a journey to uncover the secrets of the universe and her own heart.\",\n",
" \"In 'The Shadow Weaver' by Piper Redding, a young orphan discovers she has the ability to weave powerful illusions, but soon finds herself at the center of a deadly game of cat and mouse between rival factions vying for control of the mystical arts.\",\n",
" \"In 'The Lost Expedition' by Caspian Grey, a team of explorers ventures into the heart of the Amazon rainforest in search of a lost city, but soon finds themselves hunted by a ruthless treasure hunter and the treacherous jungle itself.\",\n",
" \"In 'The Clockwork Kingdom' by Augusta Wynter, a brilliant inventor discovers a hidden world of clockwork machines and ancient magic, where a rebellion is brewing against the tyrannical ruler of the land.\",\n",
" \"In 'The Phantom Pilgrim' by Rowan Welles, a charismatic smuggler is hired by a mysterious organization to transport a valuable artifact across a war-torn continent, but soon finds themselves pursued by deadly assassins and rival factions.\",\n",
" \"In 'The Dreamwalker's Journey' by Lyra Snow, a young dreamwalker discovers she has the ability to enter people's dreams, but soon finds herself trapped in a surreal world of nightmares and illusions, where the boundaries between reality and fantasy blur.\",\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use the [OpenAI Embedding](https://platform.openai.com/docs/guides/embeddings) to generate dense vectors, and the [BM25 algorithm](https://en.wikipedia.org/wiki/Okapi_BM25) to generate sparse vectors.\n",
"\n",
"Initialize dense embedding function and get dimension"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1536"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dense_embedding_func = OpenAIEmbeddings()\n",
"dense_dim = len(dense_embedding_func.embed_query(texts[1]))\n",
"dense_dim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize sparse embedding function.\n",
"\n",
"Note that the output of sparse embedding is a set of sparse vectors, which represents the index and weight of the keywords of the input text."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: 0.4270424944042204,\n",
" 21: 1.845826690498331,\n",
" 22: 1.845826690498331,\n",
" 23: 1.845826690498331,\n",
" 24: 1.845826690498331,\n",
" 25: 1.845826690498331,\n",
" 26: 1.845826690498331,\n",
" 27: 1.2237754316221157,\n",
" 28: 1.845826690498331,\n",
" 29: 1.845826690498331,\n",
" 30: 1.845826690498331,\n",
" 31: 1.845826690498331,\n",
" 32: 1.845826690498331,\n",
" 33: 1.845826690498331,\n",
" 34: 1.845826690498331,\n",
" 35: 1.845826690498331,\n",
" 36: 1.845826690498331,\n",
" 37: 1.845826690498331,\n",
" 38: 1.845826690498331,\n",
" 39: 1.845826690498331}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sparse_embedding_func = BM25SparseEmbedding(corpus=texts)\n",
"sparse_embedding_func.embed_query(texts[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Milvus Collection and load data\n",
"\n",
"Initialize connection URI and establish connection"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"connections.connect(uri=CONNECTION_URI)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Define field names and their data types"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"pk_field = \"doc_id\"\n",
"dense_field = \"dense_vector\"\n",
"sparse_field = \"sparse_vector\"\n",
"text_field = \"text\"\n",
"fields = [\n",
" FieldSchema(\n",
" name=pk_field,\n",
" dtype=DataType.VARCHAR,\n",
" is_primary=True,\n",
" auto_id=True,\n",
" max_length=100,\n",
" ),\n",
" FieldSchema(name=dense_field, dtype=DataType.FLOAT_VECTOR, dim=dense_dim),\n",
" FieldSchema(name=sparse_field, dtype=DataType.SPARSE_FLOAT_VECTOR),\n",
" FieldSchema(name=text_field, dtype=DataType.VARCHAR, max_length=65_535),\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a collection with the defined schema"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"schema = CollectionSchema(fields=fields, enable_dynamic_field=False)\n",
"collection = Collection(\n",
" name=\"IntroductionToTheNovels\", schema=schema, consistency_level=\"Strong\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Define index for dense and sparse vectors"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"dense_index = {\"index_type\": \"FLAT\", \"metric_type\": \"IP\"}\n",
"collection.create_index(\"dense_vector\", dense_index)\n",
"sparse_index = {\"index_type\": \"SPARSE_INVERTED_INDEX\", \"metric_type\": \"IP\"}\n",
"collection.create_index(\"sparse_vector\", sparse_index)\n",
"collection.flush()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Insert entities into the collection and load the collection"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"entities = []\n",
"for text in texts:\n",
" entity = {\n",
" dense_field: dense_embedding_func.embed_documents([text])[0],\n",
" sparse_field: sparse_embedding_func.embed_documents([text])[0],\n",
" text_field: text,\n",
" }\n",
" entities.append(entity)\n",
"collection.insert(entities)\n",
"collection.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our retriever, defining search parameters for sparse and dense fields:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sparse_search_params = {\"metric_type\": \"IP\"}\n",
"dense_search_params = {\"metric_type\": \"IP\", \"params\": {}}\n",
"retriever = MilvusCollectionHybridSearchRetriever(\n",
" collection=collection,\n",
" rerank=WeightedRanker(0.5, 0.5),\n",
" anns_fields=[dense_field, sparse_field],\n",
" field_embeddings=[dense_embedding_func, sparse_embedding_func],\n",
" field_search_params=[dense_search_params, sparse_search_params],\n",
" top_k=3,\n",
" text_field=text_field,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"In the input parameters of this Retriever, we use a dense embedding and a sparse embedding to perform hybrid search on the two fields of this Collection, and use WeightedRanker for reranking. Finally, 3 top-K Documents will be returned."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content=\"In 'The Lost Expedition' by Caspian Grey, a team of explorers ventures into the heart of the Amazon rainforest in search of a lost city, but soon finds themselves hunted by a ruthless treasure hunter and the treacherous jungle itself.\", metadata={'doc_id': '449281835035545843'}),\n",
" Document(page_content=\"In 'The Phantom Pilgrim' by Rowan Welles, a charismatic smuggler is hired by a mysterious organization to transport a valuable artifact across a war-torn continent, but soon finds themselves pursued by deadly assassins and rival factions.\", metadata={'doc_id': '449281835035545845'}),\n",
" Document(page_content=\"In 'The Dreamwalker's Journey' by Lyra Snow, a young dreamwalker discovers she has the ability to enter people's dreams, but soon finds herself trapped in a surreal world of nightmares and illusions, where the boundaries between reality and fantasy blur.\", metadata={'doc_id': '449281835035545846'})]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.invoke(\"What are the story about ventures?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use within a chain\n",
"\n",
"Initialize ChatOpenAI and define a prompt template"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"llm = ChatOpenAI()\n",
"\n",
"PROMPT_TEMPLATE = \"\"\"\n",
"Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.\n",
"Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.\n",
"\n",
"<context>\n",
"{context}\n",
"</context>\n",
"\n",
"<question>\n",
"{question}\n",
"</question>\n",
"\n",
"Assistant:\"\"\"\n",
"\n",
"prompt = PromptTemplate(\n",
" template=PROMPT_TEMPLATE, input_variables=[\"context\", \"question\"]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"Define a function for formatting documents"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"Define a chain using the retriever and other components"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"rag_chain = (\n",
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"Perform a query using the defined chain"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": [
"\"Lila Rose has written 'The Memory Thief,' which follows a charismatic thief with the ability to steal and manipulate memories as they navigate a daring heist and a web of deceit and betrayal.\""
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rag_chain.invoke(\"What novels has Lila written and what are their contents?\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"Drop the collection"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"collection.drop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `MilvusCollectionHybridSearchRetriever` features and configurations head to the [API reference](https://python.langchain.com/api_reference/milvus/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@ -553,7 +553,10 @@
"cell_type": "markdown",
"id": "8edb47106e1a46a883d545849b8ab81b",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"\n",
@ -576,6 +579,9 @@
"id": "10185d26023b46108eb7d9f57d49d2b3",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
@ -603,7 +609,10 @@
"cell_type": "markdown",
"id": "8763a12b2bbd4a93a75aff182afb95dc",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"> - When you use `BM25BuiltInFunction`, please note that the full-text search is available in Milvus Standalone and Milvus Distributed, but not in Milvus Lite, although it is on the roadmap for future inclusion. It will also be available in Zilliz Cloud (fully-managed Milvus) soon. Please reach out to support@zilliz.com for more information.\n",
@ -617,7 +626,10 @@
"cell_type": "markdown",
"id": "7623eae2785240b9bd12b16a66d81610",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"### Rerank the candidates\n",
@ -632,6 +644,9 @@
"id": "7cdc8c89c7104fffa095e18ddfef8986",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
@ -645,14 +660,6 @@
")"
]
},
{
"cell_type": "markdown",
"id": "b3965036",
"metadata": {},
"source": [
"For more information about Full-text search and Hybrid search, please refer to the [Using Full-Text Search with LangChain and Milvus](https://milvus.io/docs/full_text_search_with_langchain.md) and [Hybrid Retrieval with LangChain and Milvus](https://milvus.io/docs/milvus_hybrid_search_retriever.md)."
]
},
{
"cell_type": "markdown",
"id": "8ac953f1",
@ -813,7 +820,7 @@
"provenance": []
},
"kernelspec": {
"display_name": ".venv",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -827,7 +834,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
"version": "3.13.2"
}
},
"nbformat": 4,

View File

@ -461,14 +461,6 @@ const FEATURE_TABLES = {
apiLink: "https://python.langchain.com/api_reference/elasticsearch/retrievers/langchain_elasticsearch.retrievers.ElasticsearchRetriever.html",
package: "langchain_elasticsearch"
},
{
name: "MilvusCollectionHybridSearchRetriever",
link: "milvus_hybrid_search",
selfHost: true,
cloudOffering: false,
apiLink: "https://python.langchain.com/api_reference/milvus/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html",
package: "langchain_milvus"
},
{
name: "VertexAISearchRetriever",
link: "google_vertex_ai_search",

View File

@ -153,6 +153,10 @@
{
"source": "/api_reference/tests/:path(.*/?)*",
"destination": "/api_reference/standard_tests/:path"
},
{
"source": "/docs/integrations/retrievers/milvus_hybrid_search(/?)",
"destination": "https://python.langchain.com/v0.2/docs/integrations/retrievers/milvus_hybrid_search/"
}
]
}