mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-21 06:33:41 +00:00
## Description As the title goes. The current API reference links to the deprecated class.
843 lines
26 KiB
Plaintext
843 lines
26 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "683953b3",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Qdrant\n",
|
|
"\n",
|
|
">[Qdrant](https://qdrant.tech/documentation/) (read: quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage vectors with additional payload and extended filtering support. It makes it useful for all sorts of neural network or semantic-based matching, faceted search, and other applications.\n",
|
|
"\n",
|
|
"This documentation demonstrates how to use Qdrant with Langchain for dense/sparse and hybrid retrieval.\n",
|
|
"\n",
|
|
"> This page documents the `QdrantVectorStore` class that supports multiple retrieval modes via Qdrant's new [Query API](https://qdrant.tech/blog/qdrant-1.10.x/). It requires you to run Qdrant v1.10.0 or above.\n",
|
|
"\n",
|
|
"\n",
|
|
"## Setup\n",
|
|
"\n",
|
|
"There are various modes of how to run `Qdrant`, and depending on the chosen one, there will be some subtle differences. The options include:\n",
|
|
"- Local mode, no server required\n",
|
|
"- Docker deployments\n",
|
|
"- Qdrant Cloud\n",
|
|
"\n",
|
|
"See the [installation instructions](https://qdrant.tech/documentation/install/)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e03e8460-8f32-4d1f-bb93-4f7636a476fa",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install -qU langchain-qdrant"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7d387fea",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Credentials\n",
|
|
"\n",
|
|
"There are no credentials needed to run the code in this notebook.\n",
|
|
"\n",
|
|
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4912937d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
|
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "eeead681",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Initialization\n",
|
|
"\n",
|
|
"### Local mode\n",
|
|
"\n",
|
|
"Python client allows you to run the same code in local mode without running the Qdrant server. That's great for testing things out and debugging or storing just a small amount of vectors. The embeddings might be fully kept in memory or persisted on disk.\n",
|
|
"\n",
|
|
"#### In-memory\n",
|
|
"\n",
|
|
"For some testing scenarios and quick experiments, you may prefer to keep all the data in memory only, so it gets lost when the client is destroyed - usually at the end of your script/notebook.\n",
|
|
"\n",
|
|
"\n",
|
|
"```{=mdx}\n",
|
|
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
|
"\n",
|
|
"<EmbeddingTabs/>\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "1df86797",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# | output: false\n",
|
|
"# | echo: false\n",
|
|
"from langchain_openai import OpenAIEmbeddings\n",
|
|
"\n",
|
|
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "8429667e",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-04-04T10:51:22.525091Z",
|
|
"start_time": "2023-04-04T10:51:22.522015Z"
|
|
},
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_qdrant import QdrantVectorStore\n",
|
|
"from qdrant_client import QdrantClient\n",
|
|
"from qdrant_client.http.models import Distance, VectorParams\n",
|
|
"\n",
|
|
"client = QdrantClient(\":memory:\")\n",
|
|
"\n",
|
|
"client.create_collection(\n",
|
|
" collection_name=\"demo_collection\",\n",
|
|
" vectors_config=VectorParams(size=3072, distance=Distance.COSINE),\n",
|
|
")\n",
|
|
"\n",
|
|
"vector_store = QdrantVectorStore(\n",
|
|
" client=client,\n",
|
|
" collection_name=\"demo_collection\",\n",
|
|
" embedding=embeddings,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "59f0b954",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### On-disk storage\n",
|
|
"\n",
|
|
"Local mode, without using the Qdrant server, may also store your vectors on disk so they persist between runs."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "24b370e2",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-04-04T10:51:24.827567Z",
|
|
"start_time": "2023-04-04T10:51:22.529080Z"
|
|
},
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"client = QdrantClient(path=\"/tmp/langchain_qdrant\")\n",
|
|
"\n",
|
|
"client.create_collection(\n",
|
|
" collection_name=\"demo_collection\",\n",
|
|
" vectors_config=VectorParams(size=3072, distance=Distance.COSINE),\n",
|
|
")\n",
|
|
"\n",
|
|
"vector_store = QdrantVectorStore(\n",
|
|
" client=client,\n",
|
|
" collection_name=\"demo_collection\",\n",
|
|
" embedding=embeddings,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "749658ce",
|
|
"metadata": {},
|
|
"source": [
|
|
"### On-premise server deployment\n",
|
|
"\n",
|
|
"No matter if you choose to launch Qdrant locally with [a Docker container](https://qdrant.tech/documentation/install/), or select a Kubernetes deployment with [the official Helm chart](https://github.com/qdrant/qdrant-helm), the way you're going to connect to such an instance will be identical. You'll need to provide a URL pointing to the service."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "91e7f5ce",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-04-04T10:51:24.832708Z",
|
|
"start_time": "2023-04-04T10:51:24.829905Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"url = \"<---qdrant url here --->\"\n",
|
|
"docs = [] # put docs here\n",
|
|
"qdrant = QdrantVectorStore.from_documents(\n",
|
|
" docs,\n",
|
|
" embeddings,\n",
|
|
" url=url,\n",
|
|
" prefer_grpc=True,\n",
|
|
" collection_name=\"my_documents\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "c9e21ce9",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Qdrant Cloud\n",
|
|
"\n",
|
|
"If you prefer not to keep yourself busy with managing the infrastructure, you can choose to set up a fully-managed Qdrant cluster on [Qdrant Cloud](https://cloud.qdrant.io/). There is a free forever 1GB cluster included for trying out. The main difference with using a managed version of Qdrant is that you'll need to provide an API key to secure your deployment from being accessed publicly. The value can also be set in a `QDRANT_API_KEY` environment variable."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "dcf88bdf",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-04-04T10:51:24.837599Z",
|
|
"start_time": "2023-04-04T10:51:24.834690Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"url = \"<---qdrant cloud cluster url here --->\"\n",
|
|
"api_key = \"<---api key here--->\"\n",
|
|
"qdrant = QdrantVectorStore.from_documents(\n",
|
|
" docs,\n",
|
|
" embeddings,\n",
|
|
" url=url,\n",
|
|
" prefer_grpc=True,\n",
|
|
" api_key=api_key,\n",
|
|
" collection_name=\"my_documents\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "825c7903",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Using an existing collection"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3f772575",
|
|
"metadata": {},
|
|
"source": [
|
|
"To get an instance of `langchain_qdrant.Qdrant` without loading any new documents or texts, you can use the `Qdrant.from_existing_collection()` method."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "daf7a6e5",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"qdrant = QdrantVectorStore.from_existing_collection(\n",
|
|
" embedding=embeddings,\n",
|
|
" collection_name=\"my_documents\",\n",
|
|
" url=\"http://localhost:6333\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3cddef6e",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Manage vector store\n",
|
|
"\n",
|
|
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
|
|
"\n",
|
|
"### Add items to vector store\n",
|
|
"\n",
|
|
"We can add items to our vector store by using the `add_documents` function."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "7697a362",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['c04134c3-273d-4766-949a-eee46052ad32',\n",
|
|
" '9e6ba50c-794f-4b88-94e5-411f15052a02',\n",
|
|
" 'd3202666-6f2b-4186-ac43-e35389de8166',\n",
|
|
" '50d8d6ee-69bf-4173-a6a2-b254e9928965',\n",
|
|
" 'bd2eae02-74b5-43ec-9fcf-09e9d9db6fd3',\n",
|
|
" '6dae6b37-826d-4f14-8376-da4603b35de3',\n",
|
|
" 'b0964ab5-5a14-47b4-a983-37fa5c5bd154',\n",
|
|
" '91ed6c56-fe53-49e2-8199-c3bb3c33c3eb',\n",
|
|
" '42a580cb-7469-4324-9927-0febab57ce92',\n",
|
|
" 'ff774e5c-f158-4d12-94e2-0a0162b22f27']"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from uuid import uuid4\n",
|
|
"\n",
|
|
"from langchain_core.documents import Document\n",
|
|
"\n",
|
|
"document_1 = Document(\n",
|
|
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
|
" metadata={\"source\": \"tweet\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"document_2 = Document(\n",
|
|
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
|
" metadata={\"source\": \"news\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"document_3 = Document(\n",
|
|
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
|
" metadata={\"source\": \"tweet\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"document_4 = Document(\n",
|
|
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
|
" metadata={\"source\": \"news\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"document_5 = Document(\n",
|
|
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
|
" metadata={\"source\": \"tweet\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"document_6 = Document(\n",
|
|
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
|
" metadata={\"source\": \"website\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"document_7 = Document(\n",
|
|
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
|
" metadata={\"source\": \"website\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"document_8 = Document(\n",
|
|
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
|
" metadata={\"source\": \"tweet\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"document_9 = Document(\n",
|
|
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
|
" metadata={\"source\": \"news\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"document_10 = Document(\n",
|
|
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
|
" metadata={\"source\": \"tweet\"},\n",
|
|
")\n",
|
|
"\n",
|
|
"documents = [\n",
|
|
" document_1,\n",
|
|
" document_2,\n",
|
|
" document_3,\n",
|
|
" document_4,\n",
|
|
" document_5,\n",
|
|
" document_6,\n",
|
|
" document_7,\n",
|
|
" document_8,\n",
|
|
" document_9,\n",
|
|
" document_10,\n",
|
|
"]\n",
|
|
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
|
|
"\n",
|
|
"vector_store.add_documents(documents=documents, ids=uuids)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5fd23102",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Delete items from vector store"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 37,
|
|
"id": "999cafcc",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"True"
|
|
]
|
|
},
|
|
"execution_count": 37,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"vector_store.delete(ids=[uuids[-1]])"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "1f9215c8",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-04-04T09:27:29.920258Z",
|
|
"start_time": "2023-04-04T09:27:29.913714Z"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Query vector store\n",
|
|
"\n",
|
|
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
|
|
"\n",
|
|
"### Query directly\n",
|
|
"\n",
|
|
"The simplest scenario for using Qdrant vector store is to perform a similarity search. Under the hood, our query will be encoded into vector embeddings and used to find similar documents in Qdrant collection."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "a8c513ab",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-04-04T10:51:25.204469Z",
|
|
"start_time": "2023-04-04T10:51:24.855618Z"
|
|
},
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet', '_id': 'd3202666-6f2b-4186-ac43-e35389de8166', '_collection_name': 'demo_collection'}]\n",
|
|
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet', '_id': '91ed6c56-fe53-49e2-8199-c3bb3c33c3eb', '_collection_name': 'demo_collection'}]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"results = vector_store.similarity_search(\n",
|
|
" \"LangChain provides abstractions to make working with LLMs easy\", k=2\n",
|
|
")\n",
|
|
"for res in results:\n",
|
|
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "79bcb0ce",
|
|
"metadata": {},
|
|
"source": [
|
|
"`QdrantVectorStore` supports 3 modes for similarity searches. They can be configured using the `retrieval_mode` parameter when setting up the class.\n",
|
|
"\n",
|
|
"- Dense Vector Search(Default)\n",
|
|
"- Sparse Vector Search\n",
|
|
"- Hybrid Search\n",
|
|
"\n",
|
|
"### Dense Vector Search\n",
|
|
"\n",
|
|
"To search with only dense vectors,\n",
|
|
"\n",
|
|
"- The `retrieval_mode` parameter should be set to `RetrievalMode.DENSE`(default).\n",
|
|
"- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5e097299",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_qdrant import RetrievalMode\n",
|
|
"\n",
|
|
"qdrant = QdrantVectorStore.from_documents(\n",
|
|
" docs,\n",
|
|
" embedding=embeddings,\n",
|
|
" location=\":memory:\",\n",
|
|
" collection_name=\"my_documents\",\n",
|
|
" retrieval_mode=RetrievalMode.DENSE,\n",
|
|
")\n",
|
|
"\n",
|
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
|
"found_docs = qdrant.similarity_search(query)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dbd93d85",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Sparse Vector Search\n",
|
|
"\n",
|
|
"To search with only sparse vectors,\n",
|
|
"\n",
|
|
"- The `retrieval_mode` parameter should be set to `RetrievalMode.SPARSE`.\n",
|
|
"- An implementation of the [`SparseEmbeddings`](https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/sparse_embeddings.py) interface using any sparse embeddings provider has to be provided as value to the `sparse_embedding` parameter.\n",
|
|
"\n",
|
|
"The `langchain-qdrant` package provides a [FastEmbed](https://github.com/qdrant/fastembed) based implementation out of the box.\n",
|
|
"\n",
|
|
"To use it, install the FastEmbed package."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8435c0f1",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install fastembed"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7cf1e3ef",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_qdrant import FastEmbedSparse, RetrievalMode\n",
|
|
"\n",
|
|
"sparse_embeddings = FastEmbedSparse(model_name=\"Qdrant/BM25\")\n",
|
|
"\n",
|
|
"qdrant = QdrantVectorStore.from_documents(\n",
|
|
" docs,\n",
|
|
" sparse_embedding=sparse_embeddings,\n",
|
|
" location=\":memory:\",\n",
|
|
" collection_name=\"my_documents\",\n",
|
|
" retrieval_mode=RetrievalMode.SPARSE,\n",
|
|
")\n",
|
|
"\n",
|
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
|
"found_docs = qdrant.similarity_search(query)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "26e20c61",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Hybrid Vector Search\n",
|
|
"\n",
|
|
"To perform a hybrid search using dense and sparse vectors with score fusion,\n",
|
|
"\n",
|
|
"- The `retrieval_mode` parameter should be set to `RetrievalMode.HYBRID`.\n",
|
|
"- A [dense embeddings](https://python.langchain.com/v0.2/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter.\n",
|
|
"- An implementation of the [`SparseEmbeddings`](https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/sparse_embeddings.py) interface using any sparse embeddings provider has to be provided as value to the `sparse_embedding` parameter.\n",
|
|
"\n",
|
|
"Note that if you've added documents with the `HYBRID` mode, you can switch to any retrieval mode when searching. Since both the dense and sparse vectors are available in the collection."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "f37c8519",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_qdrant import FastEmbedSparse, RetrievalMode\n",
|
|
"\n",
|
|
"sparse_embeddings = FastEmbedSparse(model_name=\"Qdrant/BM25\")\n",
|
|
"\n",
|
|
"qdrant = QdrantVectorStore.from_documents(\n",
|
|
" docs,\n",
|
|
" embedding=embeddings,\n",
|
|
" sparse_embedding=sparse_embeddings,\n",
|
|
" location=\":memory:\",\n",
|
|
" collection_name=\"my_documents\",\n",
|
|
" retrieval_mode=RetrievalMode.HYBRID,\n",
|
|
")\n",
|
|
"\n",
|
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
|
"found_docs = qdrant.similarity_search(query)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "1bda9bf5",
|
|
"metadata": {},
|
|
"source": [
|
|
"If you want to execute a similarity search and receive the corresponding scores you can run:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "8804a21d",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-04-04T10:51:25.631585Z",
|
|
"start_time": "2023-04-04T10:51:25.227384Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"* [SIM=0.531834] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news', '_id': '9e6ba50c-794f-4b88-94e5-411f15052a02', '_collection_name': 'demo_collection'}]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"results = vector_store.similarity_search_with_score(\n",
|
|
" query=\"Will it be hot tomorrow\", k=1\n",
|
|
")\n",
|
|
"for doc, score in results:\n",
|
|
" print(f\"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]\")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "525e3582",
|
|
"metadata": {},
|
|
"source": [
|
|
"For a full list of all the search functions available for a `QdrantVectorStore`, read the [API reference](https://api.python.langchain.com/en/latest/qdrant/langchain_qdrant.qdrant.QdrantVectorStore.html)\n",
|
|
"\n",
|
|
"### Metadata filtering\n",
|
|
"\n",
|
|
"Qdrant has an [extensive filtering system](https://qdrant.tech/documentation/concepts/filtering/) with rich type support. It is also possible to use the filters in Langchain, by passing an additional param to both the `similarity_search_with_score` and `similarity_search` methods."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "dc7cffc8",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"* The top 10 soccer players in the world right now. [{'source': 'website', '_id': 'b0964ab5-5a14-47b4-a983-37fa5c5bd154', '_collection_name': 'demo_collection'}]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from qdrant_client.http import models\n",
|
|
"\n",
|
|
"results = vector_store.similarity_search(\n",
|
|
" query=\"Who are the best soccer players in the world?\",\n",
|
|
" k=1,\n",
|
|
" filter=models.Filter(\n",
|
|
" should=[\n",
|
|
" models.FieldCondition(\n",
|
|
" key=\"page_content\",\n",
|
|
" match=models.MatchValue(\n",
|
|
" value=\"The top 10 soccer players in the world right now.\"\n",
|
|
" ),\n",
|
|
" ),\n",
|
|
" ]\n",
|
|
" ),\n",
|
|
")\n",
|
|
"for doc in results:\n",
|
|
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "691a82d6",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Query by turning into retriever\n",
|
|
"\n",
|
|
"You can also transform the vector store into a retriever for easier usage in your chains. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"id": "9427195f",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-04-04T10:51:26.031451Z",
|
|
"start_time": "2023-04-04T10:51:26.018763Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(metadata={'source': 'news', '_id': '50d8d6ee-69bf-4173-a6a2-b254e9928965', '_collection_name': 'demo_collection'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
|
|
]
|
|
},
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 1})\n",
|
|
"retriever.invoke(\"Stealing from the bank is a crime\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6ac07288",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Usage for retrieval-augmented generation\n",
|
|
"\n",
|
|
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
|
"\n",
|
|
"- [Tutorials: working with external knowledge](https://python.langchain.com/v0.2/docs/tutorials/#working-with-external-knowledge)\n",
|
|
"- [How-to: Question and answer with RAG](https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag)\n",
|
|
"- [Retrieval conceptual docs](https://python.langchain.com/v0.2/docs/concepts/#retrieval)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "0358ecde",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Customizing Qdrant\n",
|
|
"\n",
|
|
"There are options to use an existing Qdrant collection within your Langchain application. In such cases, you may need to define how to map Qdrant point into the Langchain `Document`.\n",
|
|
"\n",
|
|
"### Named vectors\n",
|
|
"\n",
|
|
"Qdrant supports [multiple vectors per point](https://qdrant.tech/documentation/concepts/collections/#collection-with-multiple-vectors) by named vectors. If you work with a collection created externally or want to have the differently named vector used, you can configure it by providing its name.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "1f11adf8",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_qdrant import RetrievalMode\n",
|
|
"\n",
|
|
"QdrantVectorStore.from_documents(\n",
|
|
" docs,\n",
|
|
" embedding=embeddings,\n",
|
|
" sparse_embedding=sparse_embeddings,\n",
|
|
" location=\":memory:\",\n",
|
|
" collection_name=\"my_documents_2\",\n",
|
|
" retrieval_mode=RetrievalMode.HYBRID,\n",
|
|
" vector_name=\"custom_vector\",\n",
|
|
" sparse_vector_name=\"custom_sparse_vector\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b2350093",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"### Metadata\n",
|
|
"\n",
|
|
"Qdrant stores your vector embeddings along with the optional JSON-like payload. Payloads are optional, but since LangChain assumes the embeddings are generated from the documents, we keep the context data, so you can extract the original texts as well.\n",
|
|
"\n",
|
|
"By default, your document is going to be stored in the following payload structure:\n",
|
|
"\n",
|
|
"```json\n",
|
|
"{\n",
|
|
" \"page_content\": \"Lorem ipsum dolor sit amet\",\n",
|
|
" \"metadata\": {\n",
|
|
" \"foo\": \"bar\"\n",
|
|
" }\n",
|
|
"}\n",
|
|
"```\n",
|
|
"\n",
|
|
"You can, however, decide to use different keys for the page content and metadata. That's useful if you already have a collection that you'd like to reuse."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e4d6baf9",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-04-04T11:08:31.739141Z",
|
|
"start_time": "2023-04-04T11:08:30.229748Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"QdrantVectorStore.from_documents(\n",
|
|
" docs,\n",
|
|
" embeddings,\n",
|
|
" location=\":memory:\",\n",
|
|
" collection_name=\"my_documents_2\",\n",
|
|
" content_payload_key=\"my_page_content_key\",\n",
|
|
" metadata_payload_key=\"my_meta\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2300e785",
|
|
"metadata": {},
|
|
"source": [
|
|
"## API reference\n",
|
|
"\n",
|
|
"For detailed documentation of all `QdrantVectorStore` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/qdrant/langchain_qdrant.qdrant.QdrantVectorStore.html"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.9"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|