From 6b32810b68bd8742fa07c389e1542452918fe4df Mon Sep 17 00:00:00 2001 From: Anush Date: Thu, 8 Aug 2024 22:28:26 +0530 Subject: [PATCH] qdrant: Update doc with usage snippets (#25179) ## Description This PR adds back snippets demonstrating sparse and hybrid retrieval in the Qdrant notebook. Without the snippets, it's hard to grok the usage. --- .../integrations/vectorstores/qdrant.ipynb | 133 +++++++++++++----- 1 file changed, 97 insertions(+), 36 deletions(-) diff --git a/docs/docs/integrations/vectorstores/qdrant.ipynb b/docs/docs/integrations/vectorstores/qdrant.ipynb index 74edf0d9dc4..c0ccbe4f08e 100644 --- a/docs/docs/integrations/vectorstores/qdrant.ipynb +++ b/docs/docs/integrations/vectorstores/qdrant.ipynb @@ -260,7 +260,7 @@ "outputs": [], "source": [ "qdrant = QdrantVectorStore.from_existing_collection(\n", - " embeddings=embeddings,\n", + " embedding=embeddings,\n", " collection_name=\"my_documents\",\n", " url=\"http://localhost:6333\",\n", ")" @@ -458,7 +458,7 @@ }, { "cell_type": "markdown", - "id": "dbd93d85", + "id": "79bcb0ce", "metadata": {}, "source": [ "`QdrantVectorStore` supports 3 modes for similarity searches. They can be configured using the `retrieval_mode` parameter when setting up the class.\n", @@ -472,8 +472,35 @@ "To search with only dense vectors,\n", "\n", "- The `retrieval_mode` parameter should be set to `RetrievalMode.DENSE`(default).\n", - "- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter.\n", + "- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e097299", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_qdrant import RetrievalMode\n", "\n", + "qdrant = QdrantVectorStore.from_documents(\n", + " docs,\n", + " embedding=embeddings,\n", + " location=\":memory:\",\n", + " collection_name=\"my_documents\",\n", + " retrieval_mode=RetrievalMode.DENSE,\n", + ")\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "found_docs = qdrant.similarity_search(query)" + ] + }, + { + "cell_type": "markdown", + "id": "dbd93d85", + "metadata": {}, + "source": [ "### Sparse Vector Search\n", "\n", "To search with only sparse vectors,\n", @@ -483,6 +510,47 @@ "\n", "The `langchain-qdrant` package provides a [FastEmbed](https://github.com/qdrant/fastembed) based implementation out of the box.\n", "\n", + "To use it, install the FastEmbed package." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8435c0f1", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install fastembed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7cf1e3ef", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_qdrant import FastEmbedSparse, RetrievalMode\n", + "\n", + "sparse_embeddings = FastEmbedSparse(model_name=\"Qdrant/BM25\")\n", + "\n", + "qdrant = QdrantVectorStore.from_documents(\n", + " docs,\n", + " sparse_embedding=sparse_embeddings,\n", + " location=\":memory:\",\n", + " collection_name=\"my_documents\",\n", + " retrieval_mode=RetrievalMode.SPARSE,\n", + ")\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "found_docs = qdrant.similarity_search(query)" + ] + }, + { + "cell_type": "markdown", + "id": "26e20c61", + "metadata": {}, + "source": [ "### Hybrid Vector Search\n", "\n", "To perform a hybrid search using dense and sparse vectors with score fusion,\n", @@ -494,6 +562,30 @@ "Note that if you've added documents with the `HYBRID` mode, you can switch to any retrieval mode when searching. Since both the dense and sparse vectors are available in the collection." ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "f37c8519", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_qdrant import FastEmbedSparse, RetrievalMode\n", + "\n", + "sparse_embeddings = FastEmbedSparse(model_name=\"Qdrant/BM25\")\n", + "\n", + "qdrant = QdrantVectorStore.from_documents(\n", + " docs,\n", + " embedding=embeddings,\n", + " sparse_embedding=sparse_embeddings,\n", + " location=\":memory:\",\n", + " collection_name=\"my_documents\",\n", + " retrieval_mode=RetrievalMode.HYBRID,\n", + ")\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "found_docs = qdrant.similarity_search(query)" + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -503,37 +595,6 @@ "If you want to execute a similarity search and receive the corresponding scores you can run:" ] }, - { - "cell_type": "code", - "execution_count": 11, - "id": "cf772328", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "([Record(id='42a580cb-7469-4324-9927-0febab57ce92', payload={'page_content': 'The stock market is down 500 points today due to fears of a recession.', 'metadata': {'source': 'news'}}, vector=None, shard_key=None, order_value=None),\n", - " Record(id='50d8d6ee-69bf-4173-a6a2-b254e9928965', payload={'page_content': 'Robbers broke into the city bank and stole $1 million in cash.', 'metadata': {'source': 'news'}}, vector=None, shard_key=None, order_value=None),\n", - " Record(id='6dae6b37-826d-4f14-8376-da4603b35de3', payload={'page_content': 'Is the new iPhone worth the price? Read this review to find out.', 'metadata': {'source': 'website'}}, vector=None, shard_key=None, order_value=None),\n", - " Record(id='91ed6c56-fe53-49e2-8199-c3bb3c33c3eb', payload={'page_content': 'LangGraph is the best framework for building stateful, agentic applications!', 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None),\n", - " Record(id='9e6ba50c-794f-4b88-94e5-411f15052a02', payload={'page_content': 'The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.', 'metadata': {'source': 'news'}}, vector=None, shard_key=None, order_value=None),\n", - " Record(id='b0964ab5-5a14-47b4-a983-37fa5c5bd154', payload={'page_content': 'The top 10 soccer players in the world right now.', 'metadata': {'source': 'website'}}, vector=None, shard_key=None, order_value=None),\n", - " Record(id='bd2eae02-74b5-43ec-9fcf-09e9d9db6fd3', payload={'page_content': \"Wow! That was an amazing movie. I can't wait to see it again.\", 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None),\n", - " Record(id='c04134c3-273d-4766-949a-eee46052ad32', payload={'page_content': 'I had chocalate chip pancakes and scrambled eggs for breakfast this morning.', 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None),\n", - " Record(id='d3202666-6f2b-4186-ac43-e35389de8166', payload={'page_content': 'Building an exciting new project with LangChain - come check it out!', 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None),\n", - " Record(id='ff774e5c-f158-4d12-94e2-0a0162b22f27', payload={'page_content': 'I have a bad feeling I am going to get deleted :(', 'metadata': {'source': 'tweet'}}, vector=None, shard_key=None, order_value=None)],\n", - " None)" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "client.scroll(collection_name=\"demo_collection\")" - ] - }, { "cell_type": "code", "execution_count": 12, @@ -740,12 +801,12 @@ }, "outputs": [], "source": [ - "from langchain_qdrant import RetrievalMode, SparseEmbeddings\n", + "from langchain_qdrant import RetrievalMode\n", "\n", "QdrantVectorStore.from_documents(\n", " docs,\n", " embedding=embeddings,\n", - " sparse_embedding=SparseEmbeddings(),\n", + " sparse_embedding=sparse_embeddings,\n", " location=\":memory:\",\n", " collection_name=\"my_documents_2\",\n", " retrieval_mode=RetrievalMode.HYBRID,\n",