qdrant: Update doc with usage snippets (#25179)

## Description

This PR adds back snippets demonstrating sparse and hybrid retrieval in
the Qdrant notebook.

Without the snippets, it's hard to grok the usage.
This commit is contained in:
Anush 2024-08-08 22:28:26 +05:30 committed by GitHub
parent 3da2713172
commit 6b32810b68
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -260,7 +260,7 @@
"outputs": [],
"source": [
"qdrant = QdrantVectorStore.from_existing_collection(\n",
" embeddings=embeddings,\n",
" embedding=embeddings,\n",
" collection_name=\"my_documents\",\n",
" url=\"http://localhost:6333\",\n",
")"
@ -458,7 +458,7 @@
},
{
"cell_type": "markdown",
"id": "dbd93d85",
"id": "79bcb0ce",
"metadata": {},
"source": [
"`QdrantVectorStore` supports 3 modes for similarity searches. They can be configured using the `retrieval_mode` parameter when setting up the class.\n",
@ -472,8 +472,35 @@
"To search with only dense vectors,\n",
"\n",
"- The `retrieval_mode` parameter should be set to `RetrievalMode.DENSE`(default).\n",
"- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter.\n",
"- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e097299",
"metadata": {},
"outputs": [],
"source": [
"from langchain_qdrant import RetrievalMode\n",
"\n",
"qdrant = QdrantVectorStore.from_documents(\n",
" docs,\n",
" embedding=embeddings,\n",
" location=\":memory:\",\n",
" collection_name=\"my_documents\",\n",
" retrieval_mode=RetrievalMode.DENSE,\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"found_docs = qdrant.similarity_search(query)"
]
},
{
"cell_type": "markdown",
"id": "dbd93d85",
"metadata": {},
"source": [
"### Sparse Vector Search\n",
"\n",
"To search with only sparse vectors,\n",
@ -483,6 +510,47 @@
"\n",
"The `langchain-qdrant` package provides a [FastEmbed](https://github.com/qdrant/fastembed) based implementation out of the box.\n",
"\n",
"To use it, install the FastEmbed package."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8435c0f1",
"metadata": {},
"outputs": [],
"source": [
"%pip install fastembed"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7cf1e3ef",
"metadata": {},
"outputs": [],
"source": [
"from langchain_qdrant import FastEmbedSparse, RetrievalMode\n",
"\n",
"sparse_embeddings = FastEmbedSparse(model_name=\"Qdrant/BM25\")\n",
"\n",
"qdrant = QdrantVectorStore.from_documents(\n",
" docs,\n",
" sparse_embedding=sparse_embeddings,\n",
" location=\":memory:\",\n",
" collection_name=\"my_documents\",\n",
" retrieval_mode=RetrievalMode.SPARSE,\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"found_docs = qdrant.similarity_search(query)"
]
},
{
"cell_type": "markdown",
"id": "26e20c61",
"metadata": {},
"source": [
"### Hybrid Vector Search\n",
"\n",
"To perform a hybrid search using dense and sparse vectors with score fusion,\n",
@ -494,6 +562,30 @@
"Note that if you've added documents with the `HYBRID` mode, you can switch to any retrieval mode when searching. Since both the dense and sparse vectors are available in the collection."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f37c8519",
"metadata": {},
"outputs": [],
"source": [
"from langchain_qdrant import FastEmbedSparse, RetrievalMode\n",
"\n",
"sparse_embeddings = FastEmbedSparse(model_name=\"Qdrant/BM25\")\n",
"\n",
"qdrant = QdrantVectorStore.from_documents(\n",
" docs,\n",
" embedding=embeddings,\n",
" sparse_embedding=sparse_embeddings,\n",
" location=\":memory:\",\n",
" collection_name=\"my_documents\",\n",
" retrieval_mode=RetrievalMode.HYBRID,\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"found_docs = qdrant.similarity_search(query)"
]
},
{
"attachments": {},
"cell_type": "markdown",
@ -503,37 +595,6 @@
"If you want to execute a similarity search and receive the corresponding scores you can run:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "cf772328",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"([Record(id='42a580cb-7469-4324-9927-0febab57ce92', payload={'page_content': 'The stock market is down 500 points today due to fears of a recession.', 'metadata': {'source': 'news'}}, vector=None, shard_key=None, order_value=None),\n",
" Record(id='50d8d6ee-69bf-4173-a6a2-b254e9928965', payload={'page_content': 'Robbers broke into the city bank and stole $1 million in cash.', 'metadata': {'source': 'news'}}, vector=None, shard_key=None, order_value=None),\n",
" Record(id='6dae6b37-826d-4f14-8376-da4603b35de3', payload={'page_content': 'Is the new iPhone worth the price? Read this review to find out.', 'metadata': {'source': 'website'}}, vector=None, shard_key=None, order_value=None),\n",
" Record(id='91ed6c56-fe53-49e2-8199-c3bb3c33c3eb', payload={'page_content': 'LangGraph is the best framework for building stateful, agentic applications!', 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None),\n",
" Record(id='9e6ba50c-794f-4b88-94e5-411f15052a02', payload={'page_content': 'The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.', 'metadata': {'source': 'news'}}, vector=None, shard_key=None, order_value=None),\n",
" Record(id='b0964ab5-5a14-47b4-a983-37fa5c5bd154', payload={'page_content': 'The top 10 soccer players in the world right now.', 'metadata': {'source': 'website'}}, vector=None, shard_key=None, order_value=None),\n",
" Record(id='bd2eae02-74b5-43ec-9fcf-09e9d9db6fd3', payload={'page_content': \"Wow! That was an amazing movie. I can't wait to see it again.\", 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None),\n",
" Record(id='c04134c3-273d-4766-949a-eee46052ad32', payload={'page_content': 'I had chocalate chip pancakes and scrambled eggs for breakfast this morning.', 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None),\n",
" Record(id='d3202666-6f2b-4186-ac43-e35389de8166', payload={'page_content': 'Building an exciting new project with LangChain - come check it out!', 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None),\n",
" Record(id='ff774e5c-f158-4d12-94e2-0a0162b22f27', payload={'page_content': 'I have a bad feeling I am going to get deleted :(', 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None)],\n",
" None)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.scroll(collection_name=\"demo_collection\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
@ -740,12 +801,12 @@
},
"outputs": [],
"source": [
"from langchain_qdrant import RetrievalMode, SparseEmbeddings\n",
"from langchain_qdrant import RetrievalMode\n",
"\n",
"QdrantVectorStore.from_documents(\n",
" docs,\n",
" embedding=embeddings,\n",
" sparse_embedding=SparseEmbeddings(),\n",
" sparse_embedding=sparse_embeddings,\n",
" location=\":memory:\",\n",
" collection_name=\"my_documents_2\",\n",
" retrieval_mode=RetrievalMode.HYBRID,\n",