Add template for self-query-qdrant (#12795)

This PR adds a self-querying template using Qdrant as a vector store. The template uses an artificial dataset and was implemented in a way that simplifies passing different components and choosing LLM and embedding providers. --------- Co-authored-by: Erick Friis <erick@langchain.dev>
2025-08-09 13:00:34 +00:00 · 2023-11-03 21:37:29 +01:00 · 2023-11-03 21:37:29 +01:00 · 66c41c0dbf
commit 66c41c0dbf
parent f41f4c5e37
9 changed files with 2394 additions and 0 deletions
--- a/templates/self-query-qdrant/.gitignore
+++ b/templates/self-query-qdrant/.gitignore
@ -0,0 +1,2 @@
 .idea
 tests
--- a/templates/self-query-qdrant/README.md
+++ b/templates/self-query-qdrant/README.md
@ -0,0 +1,161 @@
 # self-query-qdrant
 This template performs [self-querying](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/) 
 using Qdrant and OpenAI. By default, it uses an artificial dataset of 10 documents, but you can replace it with your own dataset.
 ## Environment Setup
 Set the `OPENAI_API_KEY` environment variable to access the OpenAI models.
 Set the `QDRANT_URL` to the URL of your Qdrant instance. If you use [Qdrant Cloud](https://cloud.qdrant.io)
 you have to set the `QDRANT_API_KEY` environment variable as well. If you do not set any of them,
 the template will try to connect a local Qdrant instance at `http://localhost:6333`.
 ```shell
 export QDRANT_URL=
 export QDRANT_API_KEY=
 export OPENAI_API_KEY=
 ```
 ## Usage
 To use this package, install the LangChain CLI first:
 ```shell
 pip install -U "langchain-cli[serve]"
 ```
 Create a new LangChain project and install this package as the only one:
 ```shell
 langchain app new my-app --package self-query-qdrant
 ```
 To add this to an existing project, run:
 ```shell
 langchain app add self-query-qdrant
 ```
 ### Defaults
 Before you launch the server, you need to create a Qdrant collection and index the documents.
 It can be done by running the following command:
 ```python
 from self_query_qdrant.chain import initialize
 initialize()
 ```
 Add the following code to your `app/server.py` file:
 ```python
 from self_query_qdrant.chain import chain
 add_routes(app, chain, path="/self-query-qdrant")
 ```
 The default dataset consists 10 documents about dishes, along with their price and restaurant information.
 You can find the documents in the `packages/self-query-qdrant/self_query_qdrant/defaults.py` file.
 Here is one of the documents:
 ```python
 from langchain.schema import Document
 Document(
    page_content="Spaghetti with meatballs and tomato sauce",
    metadata={
        "price": 12.99,
        "restaurant": {
            "name": "Olive Garden",
            "location": ["New York", "Chicago", "Los Angeles"],
        },
    },
 )
 ```
 The self-querying allows performing semantic search over the documents, with some additional filtering
 based on the metadata. For example, you can search for the dishes that cost less than $15 and are served in New York.
 ### Customization
 All the examples above assume that you want to launch the template with just the defaults.
 If you want to customize the template, you can do it by passing the parameters to the `create_chain` function
 in the `app/server.py` file:
 ```python
 from langchain.llms import Cohere
 from langchain.embeddings import HuggingFaceEmbeddings
 from langchain.chains.query_constructor.schema import AttributeInfo
 from self_query_qdrant.chain import create_chain
 chain = create_chain(
    llm=Cohere(),
    embeddings=HuggingFaceEmbeddings(),
    document_contents="Descriptions of cats, along with their names and breeds.",
    metadata_field_info=[
        AttributeInfo(name="name", description="Name of the cat", type="string"),
        AttributeInfo(name="breed", description="Cat's breed", type="string"),
    ],
    collection_name="cats",
 )
 ```
 The same goes for the `initialize` function that creates a Qdrant collection and indexes the documents:
 ```python
 from langchain.schema import Document
 from langchain.embeddings import HuggingFaceEmbeddings
 from self_query_qdrant.chain import initialize
 initialize(
    embeddings=HuggingFaceEmbeddings(),
    collection_name="cats",
    documents=[
        Document(
            page_content="A mean lazy old cat who destroys furniture and eats lasagna",
            metadata={"name": "Garfield", "breed": "Tabby"},
        ),
        ...
    ]
 )
 ```
 The template is flexible and might be used for different sets of documents easily.
 ### LangSmith
 (Optional) If you have access to LangSmith, configure it to help trace, monitor and debug LangChain applications. If you don't have access, skip this section.
 ```shell
 export LANGCHAIN_TRACING_V2=true
 export LANGCHAIN_API_KEY=<your-api-key>
 export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"
 ```
 If you are inside this directory, then you can spin up a LangServe instance directly by:
 ```shell
 langchain serve
 ```
 ### Local Server
 This will start the FastAPI app with a server running locally at 
 [http://localhost:8000](http://localhost:8000)
 You can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
 Access the playground at [http://127.0.0.1:8000/self-query-qdrant/playground](http://127.0.0.1:8000/self-query-qdrant/playground)
 Access the template from code with:
 ```python
 from langserve.client import RemoteRunnable
 runnable = RemoteRunnable("http://localhost:8000/self-query-qdrant")
 ```
--- a/templates/self-query-qdrant/poetry.lock
+++ b/templates/self-query-qdrant/poetry.lock
--- a/templates/self-query-qdrant/pyproject.toml
+++ b/templates/self-query-qdrant/pyproject.toml
@ -0,0 +1,32 @@
 [tool.poetry]
 name = "self-query-qdrant"
 version = "0.1.0"
 description = "Self-querying retriever using Qdrant"
 authors = ["Kacper Łukawski <lukawski.kacper@gmail.com>"]
 license = "Apache 2.0"
 readme = "README.md"
 packages = [{include = "self_query_qdrant"}]
 [tool.poetry.dependencies]
 python = ">=3.9,<3.13"
 langchain = ">=0.0.325"
 openai = "^0.28.1"
 qdrant-client = ">=1.6"
 lark = "^1.1.8"
 tiktoken = "^0.5.1"
 [tool.poetry.group.dev.dependencies]
 langchain-cli = ">=0.0.15"
 [tool.poetry.group.dev.dependencies.python-dotenv]
 extras = [
    "cli",
 ]
 version = "^1.0.0"
 [tool.langserve]
 export_module = "self_query_qdrant"
 export_attr = "chain"
 [build-system]
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"
--- a/templates/self-query-qdrant/self_query_qdrant/init.py
+++ b/templates/self-query-qdrant/self_query_qdrant/init.py
@ -0,0 +1,3 @@
 from self_query_qdrant.chain import chain
 __all__ = ["chain"]
--- a/templates/self-query-qdrant/self_query_qdrant/chain.py
+++ b/templates/self-query-qdrant/self_query_qdrant/chain.py
@ -0,0 +1,92 @@
 import os
 from typing import List, Optional
 from langchain.chains.query_constructor.schema import AttributeInfo
 from langchain.embeddings import OpenAIEmbeddings
 from langchain.llms import BaseLLM
 from langchain.llms.openai import OpenAI
 from langchain.pydantic_v1 import BaseModel
 from langchain.retrievers import SelfQueryRetriever
 from langchain.schema import Document, StrOutputParser
 from langchain.schema.embeddings import Embeddings
 from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
 from langchain.vectorstores.qdrant import Qdrant
 from qdrant_client import QdrantClient
 from self_query_qdrant import defaults, helper, prompts
 class Query(BaseModel):
    __root__: str
 def create_chain(
    llm: Optional[BaseLLM] = None,
    embeddings: Optional[Embeddings] = None,
    document_contents: str = defaults.DEFAULT_DOCUMENT_CONTENTS,
    metadata_field_info: List[AttributeInfo] = defaults.DEFAULT_METADATA_FIELD_INFO,
    collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
 ):
    """
    Create a chain that can be used to query a Qdrant vector store with a self-querying
    capability. By default, this chain will use the OpenAI LLM and OpenAIEmbeddings, and
    work with the default document contents and metadata field info. You can override
    these defaults by passing in your own values.
    :param llm: an LLM to use for generating text
    :param embeddings: an Embeddings to use for generating queries
    :param document_contents: a description of the document set
    :param metadata_field_info: list of metadata attributes
    :param collection_name: name of the Qdrant collection to use
    :return:
    """
    llm = llm or OpenAI()
    embeddings = embeddings or OpenAIEmbeddings()
    # Set up a vector store to store your vectors and metadata
    client = QdrantClient(
        url=os.environ.get("QDRANT_URL", "http://localhost:6333"),
        api_key=os.environ.get("QDRANT_API_KEY"),
    )
    vectorstore = Qdrant(
        client=client,
        collection_name=collection_name,
        embeddings=embeddings,
    )
    # Set up a retriever to query your vector store with self-querying capabilities
    retriever = SelfQueryRetriever.from_llm(
        llm, vectorstore, document_contents, metadata_field_info, verbose=True
    )
    context = RunnableParallel(
        context=retriever | helper.combine_documents,
        query=RunnablePassthrough(),
    )
    pipeline = context | prompts.LLM_CONTEXT_PROMPT | llm | StrOutputParser()
    return pipeline.with_types(input_type=Query)
 def initialize(
    embeddings: Optional[Embeddings] = None,
    collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
    documents: List[Document] = defaults.DEFAULT_DOCUMENTS,
 ):
    """
    Initialize a vector store with a set of documents. By default, the documents will be
    compatible with the default metadata field info. You can override these defaults by
    passing in your own values.
    :param embeddings: an Embeddings to use for generating queries
    :param collection_name: name of the Qdrant collection to use
    :param documents: a list of documents to initialize the vector store with
    :return:
    """
    embeddings = embeddings or OpenAIEmbeddings()
    # Set up a vector store to store your vectors and metadata
    Qdrant.from_documents(
        documents, embedding=embeddings, collection_name=collection_name
    )
 # Create the default chain
 chain = create_chain()
--- a/templates/self-query-qdrant/self_query_qdrant/defaults.py
+++ b/templates/self-query-qdrant/self_query_qdrant/defaults.py
@ -0,0 +1,134 @@
 from langchain.chains.query_constructor.schema import AttributeInfo
 from langchain.schema import Document
 # Qdrant collection name
 DEFAULT_COLLECTION_NAME = "restaurants"
 # Here is a description of the dataset and metadata attributes. Metadata attributes will
 # be used to filter the results of the query beyond the semantic search.
 DEFAULT_DOCUMENT_CONTENTS = (
    "Dishes served at different restaurants, along with the restaurant information"
 )
 DEFAULT_METADATA_FIELD_INFO = [
    AttributeInfo(
        name="price",
        description="The price of the dish",
        type="float",
    ),
    AttributeInfo(
        name="restaurant.name",
        description="The name of the restaurant",
        type="string",
    ),
    AttributeInfo(
        name="restaurant.location",
        description="Name of the city where the restaurant is located",
        type="string or list[string]",
    ),
 ]
 # A default set of documents to use for the vector store. This is a list of Document
 # objects, which have a page_content field and a metadata field. The metadata field is a
 # dictionary of metadata attributes compatible with the metadata field info above.
 DEFAULT_DOCUMENTS = [
    Document(
        page_content="Pepperoni pizza with extra cheese, crispy crust",
        metadata={
            "price": 10.99,
            "restaurant": {
                "name": "Pizza Hut",
                "location": ["New York", "Chicago"],
            },
        },
    ),
    Document(
        page_content="Spaghetti with meatballs and tomato sauce",
        metadata={
            "price": 12.99,
            "restaurant": {
                "name": "Olive Garden",
                "location": ["New York", "Chicago", "Los Angeles"],
            },
        },
    ),
    Document(
        page_content="Chicken tikka masala with naan",
        metadata={
            "price": 14.99,
            "restaurant": {
                "name": "Indian Oven",
                "location": ["New York", "Los Angeles"],
            },
        },
    ),
    Document(
        page_content="Chicken teriyaki with rice",
        metadata={
            "price": 11.99,
            "restaurant": {
                "name": "Sakura",
                "location": ["New York", "Chicago", "Los Angeles"],
            },
        },
    ),
    Document(
        page_content="Scabbard fish with banana and passion fruit sauce",
        metadata={
            "price": 19.99,
            "restaurant": {
                "name": "A Concha",
                "location": ["San Francisco"],
            },
        },
    ),
    Document(
        page_content="Pielmieni with sour cream",
        metadata={
            "price": 13.99,
            "restaurant": {
                "name": "Russian House",
                "location": ["New York", "Chicago"],
            },
        },
    ),
    Document(
        page_content="Chicken biryani with raita",
        metadata={
            "price": 14.99,
            "restaurant": {
                "name": "Indian Oven",
                "location": ["Los Angeles"],
            },
        },
    ),
    Document(
        page_content="Tomato soup with croutons",
        metadata={
            "price": 7.99,
            "restaurant": {
                "name": "Olive Garden",
                "location": ["New York", "Chicago", "Los Angeles"],
            },
        },
    ),
    Document(
        page_content="Vegan burger with sweet potato fries",
        metadata={
            "price": 12.99,
            "restaurant": {
                "name": "Burger King",
                "location": ["New York", "Los Angeles"],
            },
        },
    ),
    Document(
        page_content="Chicken nuggets with french fries",
        metadata={
            "price": 9.99,
            "restaurant": {
                "name": "McDonald's",
                "location": ["San Francisco", "New York", "Los Angeles"],
            },
        },
    ),
 ]
--- a/templates/self-query-qdrant/self_query_qdrant/helper.py
+++ b/templates/self-query-qdrant/self_query_qdrant/helper.py
@ -0,0 +1,27 @@
 from string import Formatter
 from typing import List
 from langchain.schema import Document
 document_template = """
 PASSAGE: {page_content}
 METADATA: {metadata}
 """
 def combine_documents(documents: List[Document]) -> str:
    """
    Combine a list of documents into a single string that might be passed further down
    to a language model.
    :param documents: list of documents to combine
    :return:
    """
    formatter = Formatter()
    return "\n\n".join(
        formatter.format(
            document_template,
            page_content=document.page_content,
            metadata=document.metadata,
        )
        for document in documents
    )
--- a/templates/self-query-qdrant/self_query_qdrant/prompts.py
+++ b/templates/self-query-qdrant/self_query_qdrant/prompts.py
@ -0,0 +1,16 @@
 from langchain.prompts import PromptTemplate
 llm_context_prompt_template = """
 Answer the user query using provided passages. Each passage has metadata given as 
 a nested JSON object you can also use. When answering, cite source name of the passages 
 you are answering from below the answer in a unique bullet point list.
 If you don't know the answer, just say that you don't know, don't try to make up an answer.
 ----
 {context}
 ----
 Query: {query}
 """  # noqa: E501
 LLM_CONTEXT_PROMPT = PromptTemplate.from_template(llm_context_prompt_template)
		`@ -0,0 +1,3 @@`
							`from self_query_qdrant.chain import chain`

							`__all__ = ["chain"]`