mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-22 06:39:52 +00:00
Add template for self-query-qdrant (#12795)
This PR adds a self-querying template using Qdrant as a vector store. The template uses an artificial dataset and was implemented in a way that simplifies passing different components and choosing LLM and embedding providers. --------- Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
parent
f41f4c5e37
commit
66c41c0dbf
2
templates/self-query-qdrant/.gitignore
vendored
Normal file
2
templates/self-query-qdrant/.gitignore
vendored
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
.idea
|
||||||
|
tests
|
161
templates/self-query-qdrant/README.md
Normal file
161
templates/self-query-qdrant/README.md
Normal file
@ -0,0 +1,161 @@
|
|||||||
|
|
||||||
|
# self-query-qdrant
|
||||||
|
|
||||||
|
This template performs [self-querying](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)
|
||||||
|
using Qdrant and OpenAI. By default, it uses an artificial dataset of 10 documents, but you can replace it with your own dataset.
|
||||||
|
|
||||||
|
## Environment Setup
|
||||||
|
|
||||||
|
Set the `OPENAI_API_KEY` environment variable to access the OpenAI models.
|
||||||
|
|
||||||
|
Set the `QDRANT_URL` to the URL of your Qdrant instance. If you use [Qdrant Cloud](https://cloud.qdrant.io)
|
||||||
|
you have to set the `QDRANT_API_KEY` environment variable as well. If you do not set any of them,
|
||||||
|
the template will try to connect a local Qdrant instance at `http://localhost:6333`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
export QDRANT_URL=
|
||||||
|
export QDRANT_API_KEY=
|
||||||
|
|
||||||
|
export OPENAI_API_KEY=
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
To use this package, install the LangChain CLI first:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
pip install -U "langchain-cli[serve]"
|
||||||
|
```
|
||||||
|
|
||||||
|
Create a new LangChain project and install this package as the only one:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
langchain app new my-app --package self-query-qdrant
|
||||||
|
```
|
||||||
|
|
||||||
|
To add this to an existing project, run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
langchain app add self-query-qdrant
|
||||||
|
```
|
||||||
|
|
||||||
|
### Defaults
|
||||||
|
|
||||||
|
Before you launch the server, you need to create a Qdrant collection and index the documents.
|
||||||
|
It can be done by running the following command:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from self_query_qdrant.chain import initialize
|
||||||
|
|
||||||
|
initialize()
|
||||||
|
```
|
||||||
|
|
||||||
|
Add the following code to your `app/server.py` file:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from self_query_qdrant.chain import chain
|
||||||
|
|
||||||
|
add_routes(app, chain, path="/self-query-qdrant")
|
||||||
|
```
|
||||||
|
|
||||||
|
The default dataset consists 10 documents about dishes, along with their price and restaurant information.
|
||||||
|
You can find the documents in the `packages/self-query-qdrant/self_query_qdrant/defaults.py` file.
|
||||||
|
Here is one of the documents:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from langchain.schema import Document
|
||||||
|
|
||||||
|
Document(
|
||||||
|
page_content="Spaghetti with meatballs and tomato sauce",
|
||||||
|
metadata={
|
||||||
|
"price": 12.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "Olive Garden",
|
||||||
|
"location": ["New York", "Chicago", "Los Angeles"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
The self-querying allows performing semantic search over the documents, with some additional filtering
|
||||||
|
based on the metadata. For example, you can search for the dishes that cost less than $15 and are served in New York.
|
||||||
|
|
||||||
|
### Customization
|
||||||
|
|
||||||
|
All the examples above assume that you want to launch the template with just the defaults.
|
||||||
|
If you want to customize the template, you can do it by passing the parameters to the `create_chain` function
|
||||||
|
in the `app/server.py` file:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from langchain.llms import Cohere
|
||||||
|
from langchain.embeddings import HuggingFaceEmbeddings
|
||||||
|
from langchain.chains.query_constructor.schema import AttributeInfo
|
||||||
|
|
||||||
|
from self_query_qdrant.chain import create_chain
|
||||||
|
|
||||||
|
chain = create_chain(
|
||||||
|
llm=Cohere(),
|
||||||
|
embeddings=HuggingFaceEmbeddings(),
|
||||||
|
document_contents="Descriptions of cats, along with their names and breeds.",
|
||||||
|
metadata_field_info=[
|
||||||
|
AttributeInfo(name="name", description="Name of the cat", type="string"),
|
||||||
|
AttributeInfo(name="breed", description="Cat's breed", type="string"),
|
||||||
|
],
|
||||||
|
collection_name="cats",
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
The same goes for the `initialize` function that creates a Qdrant collection and indexes the documents:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from langchain.schema import Document
|
||||||
|
from langchain.embeddings import HuggingFaceEmbeddings
|
||||||
|
|
||||||
|
from self_query_qdrant.chain import initialize
|
||||||
|
|
||||||
|
initialize(
|
||||||
|
embeddings=HuggingFaceEmbeddings(),
|
||||||
|
collection_name="cats",
|
||||||
|
documents=[
|
||||||
|
Document(
|
||||||
|
page_content="A mean lazy old cat who destroys furniture and eats lasagna",
|
||||||
|
metadata={"name": "Garfield", "breed": "Tabby"},
|
||||||
|
),
|
||||||
|
...
|
||||||
|
]
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
The template is flexible and might be used for different sets of documents easily.
|
||||||
|
|
||||||
|
### LangSmith
|
||||||
|
|
||||||
|
(Optional) If you have access to LangSmith, configure it to help trace, monitor and debug LangChain applications. If you don't have access, skip this section.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
export LANGCHAIN_TRACING_V2=true
|
||||||
|
export LANGCHAIN_API_KEY=<your-api-key>
|
||||||
|
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
|
||||||
|
```
|
||||||
|
|
||||||
|
If you are inside this directory, then you can spin up a LangServe instance directly by:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
langchain serve
|
||||||
|
```
|
||||||
|
|
||||||
|
### Local Server
|
||||||
|
|
||||||
|
This will start the FastAPI app with a server running locally at
|
||||||
|
[http://localhost:8000](http://localhost:8000)
|
||||||
|
|
||||||
|
You can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
|
||||||
|
Access the playground at [http://127.0.0.1:8000/self-query-qdrant/playground](http://127.0.0.1:8000/self-query-qdrant/playground)
|
||||||
|
|
||||||
|
Access the template from code with:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from langserve.client import RemoteRunnable
|
||||||
|
|
||||||
|
runnable = RemoteRunnable("http://localhost:8000/self-query-qdrant")
|
||||||
|
```
|
1927
templates/self-query-qdrant/poetry.lock
generated
Normal file
1927
templates/self-query-qdrant/poetry.lock
generated
Normal file
File diff suppressed because it is too large
Load Diff
32
templates/self-query-qdrant/pyproject.toml
Normal file
32
templates/self-query-qdrant/pyproject.toml
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
[tool.poetry]
|
||||||
|
name = "self-query-qdrant"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "Self-querying retriever using Qdrant"
|
||||||
|
authors = ["Kacper Łukawski <lukawski.kacper@gmail.com>"]
|
||||||
|
license = "Apache 2.0"
|
||||||
|
readme = "README.md"
|
||||||
|
packages = [{include = "self_query_qdrant"}]
|
||||||
|
|
||||||
|
[tool.poetry.dependencies]
|
||||||
|
python = ">=3.9,<3.13"
|
||||||
|
langchain = ">=0.0.325"
|
||||||
|
openai = "^0.28.1"
|
||||||
|
qdrant-client = ">=1.6"
|
||||||
|
lark = "^1.1.8"
|
||||||
|
tiktoken = "^0.5.1"
|
||||||
|
|
||||||
|
[tool.poetry.group.dev.dependencies]
|
||||||
|
langchain-cli = ">=0.0.15"
|
||||||
|
[tool.poetry.group.dev.dependencies.python-dotenv]
|
||||||
|
extras = [
|
||||||
|
"cli",
|
||||||
|
]
|
||||||
|
version = "^1.0.0"
|
||||||
|
|
||||||
|
[tool.langserve]
|
||||||
|
export_module = "self_query_qdrant"
|
||||||
|
export_attr = "chain"
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = ["poetry-core"]
|
||||||
|
build-backend = "poetry.core.masonry.api"
|
@ -0,0 +1,3 @@
|
|||||||
|
from self_query_qdrant.chain import chain
|
||||||
|
|
||||||
|
__all__ = ["chain"]
|
92
templates/self-query-qdrant/self_query_qdrant/chain.py
Normal file
92
templates/self-query-qdrant/self_query_qdrant/chain.py
Normal file
@ -0,0 +1,92 @@
|
|||||||
|
import os
|
||||||
|
from typing import List, Optional
|
||||||
|
|
||||||
|
from langchain.chains.query_constructor.schema import AttributeInfo
|
||||||
|
from langchain.embeddings import OpenAIEmbeddings
|
||||||
|
from langchain.llms import BaseLLM
|
||||||
|
from langchain.llms.openai import OpenAI
|
||||||
|
from langchain.pydantic_v1 import BaseModel
|
||||||
|
from langchain.retrievers import SelfQueryRetriever
|
||||||
|
from langchain.schema import Document, StrOutputParser
|
||||||
|
from langchain.schema.embeddings import Embeddings
|
||||||
|
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
|
||||||
|
from langchain.vectorstores.qdrant import Qdrant
|
||||||
|
from qdrant_client import QdrantClient
|
||||||
|
|
||||||
|
from self_query_qdrant import defaults, helper, prompts
|
||||||
|
|
||||||
|
|
||||||
|
class Query(BaseModel):
|
||||||
|
__root__: str
|
||||||
|
|
||||||
|
|
||||||
|
def create_chain(
|
||||||
|
llm: Optional[BaseLLM] = None,
|
||||||
|
embeddings: Optional[Embeddings] = None,
|
||||||
|
document_contents: str = defaults.DEFAULT_DOCUMENT_CONTENTS,
|
||||||
|
metadata_field_info: List[AttributeInfo] = defaults.DEFAULT_METADATA_FIELD_INFO,
|
||||||
|
collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Create a chain that can be used to query a Qdrant vector store with a self-querying
|
||||||
|
capability. By default, this chain will use the OpenAI LLM and OpenAIEmbeddings, and
|
||||||
|
work with the default document contents and metadata field info. You can override
|
||||||
|
these defaults by passing in your own values.
|
||||||
|
:param llm: an LLM to use for generating text
|
||||||
|
:param embeddings: an Embeddings to use for generating queries
|
||||||
|
:param document_contents: a description of the document set
|
||||||
|
:param metadata_field_info: list of metadata attributes
|
||||||
|
:param collection_name: name of the Qdrant collection to use
|
||||||
|
:return:
|
||||||
|
"""
|
||||||
|
llm = llm or OpenAI()
|
||||||
|
embeddings = embeddings or OpenAIEmbeddings()
|
||||||
|
|
||||||
|
# Set up a vector store to store your vectors and metadata
|
||||||
|
client = QdrantClient(
|
||||||
|
url=os.environ.get("QDRANT_URL", "http://localhost:6333"),
|
||||||
|
api_key=os.environ.get("QDRANT_API_KEY"),
|
||||||
|
)
|
||||||
|
vectorstore = Qdrant(
|
||||||
|
client=client,
|
||||||
|
collection_name=collection_name,
|
||||||
|
embeddings=embeddings,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Set up a retriever to query your vector store with self-querying capabilities
|
||||||
|
retriever = SelfQueryRetriever.from_llm(
|
||||||
|
llm, vectorstore, document_contents, metadata_field_info, verbose=True
|
||||||
|
)
|
||||||
|
|
||||||
|
context = RunnableParallel(
|
||||||
|
context=retriever | helper.combine_documents,
|
||||||
|
query=RunnablePassthrough(),
|
||||||
|
)
|
||||||
|
pipeline = context | prompts.LLM_CONTEXT_PROMPT | llm | StrOutputParser()
|
||||||
|
return pipeline.with_types(input_type=Query)
|
||||||
|
|
||||||
|
|
||||||
|
def initialize(
|
||||||
|
embeddings: Optional[Embeddings] = None,
|
||||||
|
collection_name: str = defaults.DEFAULT_COLLECTION_NAME,
|
||||||
|
documents: List[Document] = defaults.DEFAULT_DOCUMENTS,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize a vector store with a set of documents. By default, the documents will be
|
||||||
|
compatible with the default metadata field info. You can override these defaults by
|
||||||
|
passing in your own values.
|
||||||
|
:param embeddings: an Embeddings to use for generating queries
|
||||||
|
:param collection_name: name of the Qdrant collection to use
|
||||||
|
:param documents: a list of documents to initialize the vector store with
|
||||||
|
:return:
|
||||||
|
"""
|
||||||
|
embeddings = embeddings or OpenAIEmbeddings()
|
||||||
|
|
||||||
|
# Set up a vector store to store your vectors and metadata
|
||||||
|
Qdrant.from_documents(
|
||||||
|
documents, embedding=embeddings, collection_name=collection_name
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# Create the default chain
|
||||||
|
chain = create_chain()
|
134
templates/self-query-qdrant/self_query_qdrant/defaults.py
Normal file
134
templates/self-query-qdrant/self_query_qdrant/defaults.py
Normal file
@ -0,0 +1,134 @@
|
|||||||
|
from langchain.chains.query_constructor.schema import AttributeInfo
|
||||||
|
from langchain.schema import Document
|
||||||
|
|
||||||
|
# Qdrant collection name
|
||||||
|
DEFAULT_COLLECTION_NAME = "restaurants"
|
||||||
|
|
||||||
|
# Here is a description of the dataset and metadata attributes. Metadata attributes will
|
||||||
|
# be used to filter the results of the query beyond the semantic search.
|
||||||
|
DEFAULT_DOCUMENT_CONTENTS = (
|
||||||
|
"Dishes served at different restaurants, along with the restaurant information"
|
||||||
|
)
|
||||||
|
DEFAULT_METADATA_FIELD_INFO = [
|
||||||
|
AttributeInfo(
|
||||||
|
name="price",
|
||||||
|
description="The price of the dish",
|
||||||
|
type="float",
|
||||||
|
),
|
||||||
|
AttributeInfo(
|
||||||
|
name="restaurant.name",
|
||||||
|
description="The name of the restaurant",
|
||||||
|
type="string",
|
||||||
|
),
|
||||||
|
AttributeInfo(
|
||||||
|
name="restaurant.location",
|
||||||
|
description="Name of the city where the restaurant is located",
|
||||||
|
type="string or list[string]",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
# A default set of documents to use for the vector store. This is a list of Document
|
||||||
|
# objects, which have a page_content field and a metadata field. The metadata field is a
|
||||||
|
# dictionary of metadata attributes compatible with the metadata field info above.
|
||||||
|
DEFAULT_DOCUMENTS = [
|
||||||
|
Document(
|
||||||
|
page_content="Pepperoni pizza with extra cheese, crispy crust",
|
||||||
|
metadata={
|
||||||
|
"price": 10.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "Pizza Hut",
|
||||||
|
"location": ["New York", "Chicago"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Document(
|
||||||
|
page_content="Spaghetti with meatballs and tomato sauce",
|
||||||
|
metadata={
|
||||||
|
"price": 12.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "Olive Garden",
|
||||||
|
"location": ["New York", "Chicago", "Los Angeles"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Document(
|
||||||
|
page_content="Chicken tikka masala with naan",
|
||||||
|
metadata={
|
||||||
|
"price": 14.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "Indian Oven",
|
||||||
|
"location": ["New York", "Los Angeles"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Document(
|
||||||
|
page_content="Chicken teriyaki with rice",
|
||||||
|
metadata={
|
||||||
|
"price": 11.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "Sakura",
|
||||||
|
"location": ["New York", "Chicago", "Los Angeles"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Document(
|
||||||
|
page_content="Scabbard fish with banana and passion fruit sauce",
|
||||||
|
metadata={
|
||||||
|
"price": 19.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "A Concha",
|
||||||
|
"location": ["San Francisco"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Document(
|
||||||
|
page_content="Pielmieni with sour cream",
|
||||||
|
metadata={
|
||||||
|
"price": 13.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "Russian House",
|
||||||
|
"location": ["New York", "Chicago"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Document(
|
||||||
|
page_content="Chicken biryani with raita",
|
||||||
|
metadata={
|
||||||
|
"price": 14.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "Indian Oven",
|
||||||
|
"location": ["Los Angeles"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Document(
|
||||||
|
page_content="Tomato soup with croutons",
|
||||||
|
metadata={
|
||||||
|
"price": 7.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "Olive Garden",
|
||||||
|
"location": ["New York", "Chicago", "Los Angeles"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Document(
|
||||||
|
page_content="Vegan burger with sweet potato fries",
|
||||||
|
metadata={
|
||||||
|
"price": 12.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "Burger King",
|
||||||
|
"location": ["New York", "Los Angeles"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Document(
|
||||||
|
page_content="Chicken nuggets with french fries",
|
||||||
|
metadata={
|
||||||
|
"price": 9.99,
|
||||||
|
"restaurant": {
|
||||||
|
"name": "McDonald's",
|
||||||
|
"location": ["San Francisco", "New York", "Los Angeles"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
]
|
27
templates/self-query-qdrant/self_query_qdrant/helper.py
Normal file
27
templates/self-query-qdrant/self_query_qdrant/helper.py
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
from string import Formatter
|
||||||
|
from typing import List
|
||||||
|
|
||||||
|
from langchain.schema import Document
|
||||||
|
|
||||||
|
document_template = """
|
||||||
|
PASSAGE: {page_content}
|
||||||
|
METADATA: {metadata}
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def combine_documents(documents: List[Document]) -> str:
|
||||||
|
"""
|
||||||
|
Combine a list of documents into a single string that might be passed further down
|
||||||
|
to a language model.
|
||||||
|
:param documents: list of documents to combine
|
||||||
|
:return:
|
||||||
|
"""
|
||||||
|
formatter = Formatter()
|
||||||
|
return "\n\n".join(
|
||||||
|
formatter.format(
|
||||||
|
document_template,
|
||||||
|
page_content=document.page_content,
|
||||||
|
metadata=document.metadata,
|
||||||
|
)
|
||||||
|
for document in documents
|
||||||
|
)
|
16
templates/self-query-qdrant/self_query_qdrant/prompts.py
Normal file
16
templates/self-query-qdrant/self_query_qdrant/prompts.py
Normal file
@ -0,0 +1,16 @@
|
|||||||
|
from langchain.prompts import PromptTemplate
|
||||||
|
|
||||||
|
llm_context_prompt_template = """
|
||||||
|
Answer the user query using provided passages. Each passage has metadata given as
|
||||||
|
a nested JSON object you can also use. When answering, cite source name of the passages
|
||||||
|
you are answering from below the answer in a unique bullet point list.
|
||||||
|
|
||||||
|
If you don't know the answer, just say that you don't know, don't try to make up an answer.
|
||||||
|
|
||||||
|
----
|
||||||
|
{context}
|
||||||
|
----
|
||||||
|
Query: {query}
|
||||||
|
""" # noqa: E501
|
||||||
|
|
||||||
|
LLM_CONTEXT_PROMPT = PromptTemplate.from_template(llm_context_prompt_template)
|
Loading…
Reference in New Issue
Block a user