|
|
|
@@ -1,639 +0,0 @@
|
|
|
|
|
{
|
|
|
|
|
"cells": [
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "raw",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"---\n",
|
|
|
|
|
"sidebar_label: Milvus Hybrid Search\n",
|
|
|
|
|
"---"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"# Milvus Hybrid Search Retriever\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> [Milvus](https://milvus.io/docs) is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"This will help you getting started with the Milvus Hybrid Search [retriever](/docs/concepts/retrievers), which combines the strengths of both dense and sparse vector search. For detailed documentation of all `MilvusCollectionHybridSearchRetriever` features and configurations head to the [API reference](https://python.langchain.com/api_reference/milvus/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html).\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"See also the Milvus Multi-Vector Search [docs](https://milvus.io/docs/multi-vector-search.md).\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"### Integration details\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"import {ItemTable} from \"@theme/FeatureTables\";\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"<ItemTable category=\"document_retrievers\" item=\"MilvusCollectionHybridSearchRetriever\" />\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"## Setup\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
|
|
|
|
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Installation\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"This retriever lives in the `langchain-milvus` package. This guide requires the following dependencies:"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 1,
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
},
|
|
|
|
|
"pycharm": {
|
|
|
|
|
"name": "#%%\n"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"%pip install --upgrade --quiet pymilvus[model] langchain-milvus langchain-openai"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"from langchain_core.output_parsers import StrOutputParser\n",
|
|
|
|
|
"from langchain_core.prompts import PromptTemplate\n",
|
|
|
|
|
"from langchain_core.runnables import RunnablePassthrough\n",
|
|
|
|
|
"from langchain_milvus.retrievers import MilvusCollectionHybridSearchRetriever\n",
|
|
|
|
|
"from langchain_milvus.utils.sparse import BM25SparseEmbedding\n",
|
|
|
|
|
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
|
|
|
|
|
"from pymilvus import (\n",
|
|
|
|
|
" Collection,\n",
|
|
|
|
|
" CollectionSchema,\n",
|
|
|
|
|
" DataType,\n",
|
|
|
|
|
" FieldSchema,\n",
|
|
|
|
|
" WeightedRanker,\n",
|
|
|
|
|
" connections,\n",
|
|
|
|
|
")"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Start the Milvus service\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Please refer to the [Milvus documentation](https://milvus.io/docs/install_standalone-docker.md) to start the Milvus service.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"After starting milvus, you need to specify your milvus connection URI."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 4,
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
},
|
|
|
|
|
"pycharm": {
|
|
|
|
|
"name": "#%%\n"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"CONNECTION_URI = \"http://localhost:19530\""
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Prepare OpenAI API Key\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Please refer to the [OpenAI documentation](https://platform.openai.com/account/api-keys) to obtain your OpenAI API key, and set it as an environment variable.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"```shell\n",
|
|
|
|
|
"export OPENAI_API_KEY=<your_api_key>\n",
|
|
|
|
|
"```\n"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Prepare dense and sparse embedding functions\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Let us fictionalize 10 fake descriptions of novels. In actual production, it may be a large amount of text data."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 5,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"texts = [\n",
|
|
|
|
|
" \"In 'The Whispering Walls' by Ava Moreno, a young journalist named Sophia uncovers a decades-old conspiracy hidden within the crumbling walls of an ancient mansion, where the whispers of the past threaten to destroy her own sanity.\",\n",
|
|
|
|
|
" \"In 'The Last Refuge' by Ethan Blackwood, a group of survivors must band together to escape a post-apocalyptic wasteland, where the last remnants of humanity cling to life in a desperate bid for survival.\",\n",
|
|
|
|
|
" \"In 'The Memory Thief' by Lila Rose, a charismatic thief with the ability to steal and manipulate memories is hired by a mysterious client to pull off a daring heist, but soon finds themselves trapped in a web of deceit and betrayal.\",\n",
|
|
|
|
|
" \"In 'The City of Echoes' by Julian Saint Clair, a brilliant detective must navigate a labyrinthine metropolis where time is currency, and the rich can live forever, but at a terrible cost to the poor.\",\n",
|
|
|
|
|
" \"In 'The Starlight Serenade' by Ruby Flynn, a shy astronomer discovers a mysterious melody emanating from a distant star, which leads her on a journey to uncover the secrets of the universe and her own heart.\",\n",
|
|
|
|
|
" \"In 'The Shadow Weaver' by Piper Redding, a young orphan discovers she has the ability to weave powerful illusions, but soon finds herself at the center of a deadly game of cat and mouse between rival factions vying for control of the mystical arts.\",\n",
|
|
|
|
|
" \"In 'The Lost Expedition' by Caspian Grey, a team of explorers ventures into the heart of the Amazon rainforest in search of a lost city, but soon finds themselves hunted by a ruthless treasure hunter and the treacherous jungle itself.\",\n",
|
|
|
|
|
" \"In 'The Clockwork Kingdom' by Augusta Wynter, a brilliant inventor discovers a hidden world of clockwork machines and ancient magic, where a rebellion is brewing against the tyrannical ruler of the land.\",\n",
|
|
|
|
|
" \"In 'The Phantom Pilgrim' by Rowan Welles, a charismatic smuggler is hired by a mysterious organization to transport a valuable artifact across a war-torn continent, but soon finds themselves pursued by deadly assassins and rival factions.\",\n",
|
|
|
|
|
" \"In 'The Dreamwalker's Journey' by Lyra Snow, a young dreamwalker discovers she has the ability to enter people's dreams, but soon finds herself trapped in a surreal world of nightmares and illusions, where the boundaries between reality and fantasy blur.\",\n",
|
|
|
|
|
"]"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"We will use the [OpenAI Embedding](https://platform.openai.com/docs/guides/embeddings) to generate dense vectors, and the [BM25 algorithm](https://en.wikipedia.org/wiki/Okapi_BM25) to generate sparse vectors.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Initialize dense embedding function and get dimension"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 6,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
|
|
|
|
"1536"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"execution_count": 6,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"dense_embedding_func = OpenAIEmbeddings()\n",
|
|
|
|
|
"dense_dim = len(dense_embedding_func.embed_query(texts[1]))\n",
|
|
|
|
|
"dense_dim"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"Initialize sparse embedding function.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Note that the output of sparse embedding is a set of sparse vectors, which represents the index and weight of the keywords of the input text."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 7,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
|
|
|
|
"{0: 0.4270424944042204,\n",
|
|
|
|
|
" 21: 1.845826690498331,\n",
|
|
|
|
|
" 22: 1.845826690498331,\n",
|
|
|
|
|
" 23: 1.845826690498331,\n",
|
|
|
|
|
" 24: 1.845826690498331,\n",
|
|
|
|
|
" 25: 1.845826690498331,\n",
|
|
|
|
|
" 26: 1.845826690498331,\n",
|
|
|
|
|
" 27: 1.2237754316221157,\n",
|
|
|
|
|
" 28: 1.845826690498331,\n",
|
|
|
|
|
" 29: 1.845826690498331,\n",
|
|
|
|
|
" 30: 1.845826690498331,\n",
|
|
|
|
|
" 31: 1.845826690498331,\n",
|
|
|
|
|
" 32: 1.845826690498331,\n",
|
|
|
|
|
" 33: 1.845826690498331,\n",
|
|
|
|
|
" 34: 1.845826690498331,\n",
|
|
|
|
|
" 35: 1.845826690498331,\n",
|
|
|
|
|
" 36: 1.845826690498331,\n",
|
|
|
|
|
" 37: 1.845826690498331,\n",
|
|
|
|
|
" 38: 1.845826690498331,\n",
|
|
|
|
|
" 39: 1.845826690498331}"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"execution_count": 7,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"sparse_embedding_func = BM25SparseEmbedding(corpus=texts)\n",
|
|
|
|
|
"sparse_embedding_func.embed_query(texts[1])"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Create Milvus Collection and load data\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Initialize connection URI and establish connection"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 8,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"connections.connect(uri=CONNECTION_URI)"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"Define field names and their data types"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 9,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"pk_field = \"doc_id\"\n",
|
|
|
|
|
"dense_field = \"dense_vector\"\n",
|
|
|
|
|
"sparse_field = \"sparse_vector\"\n",
|
|
|
|
|
"text_field = \"text\"\n",
|
|
|
|
|
"fields = [\n",
|
|
|
|
|
" FieldSchema(\n",
|
|
|
|
|
" name=pk_field,\n",
|
|
|
|
|
" dtype=DataType.VARCHAR,\n",
|
|
|
|
|
" is_primary=True,\n",
|
|
|
|
|
" auto_id=True,\n",
|
|
|
|
|
" max_length=100,\n",
|
|
|
|
|
" ),\n",
|
|
|
|
|
" FieldSchema(name=dense_field, dtype=DataType.FLOAT_VECTOR, dim=dense_dim),\n",
|
|
|
|
|
" FieldSchema(name=sparse_field, dtype=DataType.SPARSE_FLOAT_VECTOR),\n",
|
|
|
|
|
" FieldSchema(name=text_field, dtype=DataType.VARCHAR, max_length=65_535),\n",
|
|
|
|
|
"]"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"Create a collection with the defined schema"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 10,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"schema = CollectionSchema(fields=fields, enable_dynamic_field=False)\n",
|
|
|
|
|
"collection = Collection(\n",
|
|
|
|
|
" name=\"IntroductionToTheNovels\", schema=schema, consistency_level=\"Strong\"\n",
|
|
|
|
|
")"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"Define index for dense and sparse vectors"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 11,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"dense_index = {\"index_type\": \"FLAT\", \"metric_type\": \"IP\"}\n",
|
|
|
|
|
"collection.create_index(\"dense_vector\", dense_index)\n",
|
|
|
|
|
"sparse_index = {\"index_type\": \"SPARSE_INVERTED_INDEX\", \"metric_type\": \"IP\"}\n",
|
|
|
|
|
"collection.create_index(\"sparse_vector\", sparse_index)\n",
|
|
|
|
|
"collection.flush()"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"Insert entities into the collection and load the collection"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 12,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"entities = []\n",
|
|
|
|
|
"for text in texts:\n",
|
|
|
|
|
" entity = {\n",
|
|
|
|
|
" dense_field: dense_embedding_func.embed_documents([text])[0],\n",
|
|
|
|
|
" sparse_field: sparse_embedding_func.embed_documents([text])[0],\n",
|
|
|
|
|
" text_field: text,\n",
|
|
|
|
|
" }\n",
|
|
|
|
|
" entities.append(entity)\n",
|
|
|
|
|
"collection.insert(entities)\n",
|
|
|
|
|
"collection.load()"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"## Instantiation\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Now we can instantiate our retriever, defining search parameters for sparse and dense fields:"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"sparse_search_params = {\"metric_type\": \"IP\"}\n",
|
|
|
|
|
"dense_search_params = {\"metric_type\": \"IP\", \"params\": {}}\n",
|
|
|
|
|
"retriever = MilvusCollectionHybridSearchRetriever(\n",
|
|
|
|
|
" collection=collection,\n",
|
|
|
|
|
" rerank=WeightedRanker(0.5, 0.5),\n",
|
|
|
|
|
" anns_fields=[dense_field, sparse_field],\n",
|
|
|
|
|
" field_embeddings=[dense_embedding_func, sparse_embedding_func],\n",
|
|
|
|
|
" field_search_params=[dense_search_params, sparse_search_params],\n",
|
|
|
|
|
" top_k=3,\n",
|
|
|
|
|
" text_field=text_field,\n",
|
|
|
|
|
")"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"In the input parameters of this Retriever, we use a dense embedding and a sparse embedding to perform hybrid search on the two fields of this Collection, and use WeightedRanker for reranking. Finally, 3 top-K Documents will be returned."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"## Usage"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 14,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
|
|
|
|
"[Document(page_content=\"In 'The Lost Expedition' by Caspian Grey, a team of explorers ventures into the heart of the Amazon rainforest in search of a lost city, but soon finds themselves hunted by a ruthless treasure hunter and the treacherous jungle itself.\", metadata={'doc_id': '449281835035545843'}),\n",
|
|
|
|
|
" Document(page_content=\"In 'The Phantom Pilgrim' by Rowan Welles, a charismatic smuggler is hired by a mysterious organization to transport a valuable artifact across a war-torn continent, but soon finds themselves pursued by deadly assassins and rival factions.\", metadata={'doc_id': '449281835035545845'}),\n",
|
|
|
|
|
" Document(page_content=\"In 'The Dreamwalker's Journey' by Lyra Snow, a young dreamwalker discovers she has the ability to enter people's dreams, but soon finds herself trapped in a surreal world of nightmares and illusions, where the boundaries between reality and fantasy blur.\", metadata={'doc_id': '449281835035545846'})]"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"execution_count": 14,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"retriever.invoke(\"What are the story about ventures?\")"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"## Use within a chain\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Initialize ChatOpenAI and define a prompt template"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 15,
|
|
|
|
|
"metadata": {
|
|
|
|
|
"pycharm": {
|
|
|
|
|
"name": "#%%\n"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"llm = ChatOpenAI()\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"PROMPT_TEMPLATE = \"\"\"\n",
|
|
|
|
|
"Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.\n",
|
|
|
|
|
"Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"<context>\n",
|
|
|
|
|
"{context}\n",
|
|
|
|
|
"</context>\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"<question>\n",
|
|
|
|
|
"{question}\n",
|
|
|
|
|
"</question>\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Assistant:\"\"\"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"prompt = PromptTemplate(\n",
|
|
|
|
|
" template=PROMPT_TEMPLATE, input_variables=[\"context\", \"question\"]\n",
|
|
|
|
|
")"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"Define a function for formatting documents"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 16,
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
},
|
|
|
|
|
"pycharm": {
|
|
|
|
|
"name": "#%%\n"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"def format_docs(docs):\n",
|
|
|
|
|
" return \"\\n\\n\".join(doc.page_content for doc in docs)"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"Define a chain using the retriever and other components"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 17,
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
},
|
|
|
|
|
"pycharm": {
|
|
|
|
|
"name": "#%%\n"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"rag_chain = (\n",
|
|
|
|
|
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
|
|
|
|
|
" | prompt\n",
|
|
|
|
|
" | llm\n",
|
|
|
|
|
" | StrOutputParser()\n",
|
|
|
|
|
")"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"Perform a query using the defined chain"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 18,
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
},
|
|
|
|
|
"pycharm": {
|
|
|
|
|
"name": "#%%\n"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
|
"data": {
|
|
|
|
|
"text/plain": [
|
|
|
|
|
"\"Lila Rose has written 'The Memory Thief,' which follows a charismatic thief with the ability to steal and manipulate memories as they navigate a daring heist and a web of deceit and betrayal.\""
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
"execution_count": 18,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"output_type": "execute_result"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"source": [
|
|
|
|
|
"rag_chain.invoke(\"What novels has Lila written and what are their contents?\")"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"Drop the collection"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 19,
|
|
|
|
|
"metadata": {
|
|
|
|
|
"collapsed": false,
|
|
|
|
|
"jupyter": {
|
|
|
|
|
"outputs_hidden": false
|
|
|
|
|
},
|
|
|
|
|
"pycharm": {
|
|
|
|
|
"name": "#%%\n"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"collection.drop()"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"## API reference\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"For detailed documentation of all `MilvusCollectionHybridSearchRetriever` features and configurations head to the [API reference](https://python.langchain.com/api_reference/milvus/retrievers/langchain_milvus.retrievers.milvus_hybrid_search.MilvusCollectionHybridSearchRetriever.html)."
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"metadata": {
|
|
|
|
|
"kernelspec": {
|
|
|
|
|
"display_name": "Python 3 (ipykernel)",
|
|
|
|
|
"language": "python",
|
|
|
|
|
"name": "python3"
|
|
|
|
|
},
|
|
|
|
|
"language_info": {
|
|
|
|
|
"codemirror_mode": {
|
|
|
|
|
"name": "ipython",
|
|
|
|
|
"version": 3
|
|
|
|
|
},
|
|
|
|
|
"file_extension": ".py",
|
|
|
|
|
"mimetype": "text/x-python",
|
|
|
|
|
"name": "python",
|
|
|
|
|
"nbconvert_exporter": "python",
|
|
|
|
|
"pygments_lexer": "ipython3",
|
|
|
|
|
"version": "3.10.4"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"nbformat": 4,
|
|
|
|
|
"nbformat_minor": 4
|
|
|
|
|
}
|