mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-21 06:33:41 +00:00
## Description This PR adds documentation for the new ZeusDB vector store integration with LangChain. ## Motivation ZeusDB is a high-performance vector database (Python/Rust backend) designed for AI applications that need fast similarity search and real-time vector ops. This integration brings ZeusDB's capabilities to the LangChain ecosystem, giving developers another production-oriented option for vector storage and retrieval. **Key Features:** - **User-Friendly Python API**: Intuitive interface that integrates seamlessly with Python ML workflows - **High Performance**: Powered by a robust Rust backend for lightning-fast vector operations - **Enterprise Logging**: Comprehensive logging capabilities for monitoring and debugging production systems - **Advanced Features**: Includes product quantization and persistence capabilities - **AI-Optimized**: Purpose-built for modern AI applications and RAG pipelines ## Changes - Added provider documentation: `docs/docs/integrations/providers/zeusdb.mdx` (installation, setup). - Added vector store documentation: `docs/docs/integrations/vectorstores/zeusdb.ipynb` (quickstart for creating/querying a ZeusDBVectorStore). - Registered langchain-zeusdb in `libs/packages.yml` for discovery. ## Target users - AI/ML engineers building RAG pipelines - Data scientists working with large document collections - Developers needing high-throughput vector search - Teams requiring near real-time vector operations ## Testing - Followed LangChain's "How to add standard tests to an integration" guidance. - Code passes format, lint, and test checks locally. - Tested with LangChain Core 0.3.74 - Works with Python 3.10 to 3.13 ## Package Information **PyPI:** https://pypi.org/project/langchain-zeusdb **Github:** https://github.com/ZeusDB/langchain-zeusdb
618 lines
14 KiB
Plaintext
618 lines
14 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ef1f0986",
|
||
"metadata": {},
|
||
"source": [
|
||
"# ⚡ ZeusDB Vector Store\n",
|
||
"\n",
|
||
"ZeusDB is a high-performance, Rust-powered vector database with enterprise features like quantization, persistence and logging.\n",
|
||
"\n",
|
||
"This notebook covers how to get started with the ZeusDB Vector Store to efficiently use ZeusDB with LangChain."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "107c485d-13a3-4309-9fda-5a0440862d3c",
|
||
"metadata": {},
|
||
"source": [
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "36fdc060",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Setup"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "d978e3fd-d130-436f-841d-d133c0fae8fb",
|
||
"metadata": {},
|
||
"source": [
|
||
"Install the ZeusDB LangChain integration package from PyPi:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "42ca8320-b866-4f37-944e-96eda54231d2",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"pip install -qU langchain-zeusdb"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2a0e518a-ae8a-464b-8b47-9deb9d4ab063",
|
||
"metadata": {},
|
||
"source": [
|
||
"*Setup in Jupyter Notebooks*"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1d092ea6-8553-4686-9563-b8318225a04a",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 💡 Tip: If you’re working inside Jupyter or Google Colab, use the %pip magic command so the package is installed into the active kernel:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "64e28aa6",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"%pip install -qU langchain-zeusdb"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c12fe175-a299-47d3-869f-9367b6aa572d",
|
||
"metadata": {},
|
||
"source": [
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "31554e69-40b2-4201-9f92-57e73ac66d33",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Getting Started"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b696b3dd-0fed-4ed2-a79a-5b32598508c0",
|
||
"metadata": {},
|
||
"source": [
|
||
"This example uses OpenAIEmbeddings, which requires an OpenAI API key – [Get your OpenAI API key here](https://platform.openai.com/api-keys)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2b79766e-7725-4be0-a183-4947b56892c5",
|
||
"metadata": {},
|
||
"source": [
|
||
"If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b5266cc7-28da-459e-a28d-128382ed5a20",
|
||
"metadata": {},
|
||
"source": [
|
||
"Install the LangChain OpenAI integration package from PyPi:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "1ed941cd-5e06-4c61-9235-90bd0b0b0452",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"pip install -qU langchain-openai\n",
|
||
"\n",
|
||
"# Use this command if inside Jupyter Notebooks\n",
|
||
"#%pip install -qU langchain-openai"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0f49b2ec-d047-455d-8c05-da041112dd8a",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Please choose an option below for your OpenAI key integration"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ed2d9bf6-be53-4fc1-9611-158f03fd71b7",
|
||
"metadata": {},
|
||
"source": [
|
||
"*Option 1: 🔑 Enter your API key each time* "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "eff5b6a5-4c57-4531-896e-54bcb2b1dec2",
|
||
"metadata": {},
|
||
"source": [
|
||
"Use getpass in Jupyter to securely input your key for the current session:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "08a50da9-5ed1-40dc-a390-07b031369761",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"import getpass\n",
|
||
"\n",
|
||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7321917e-8586-42e4-9822-b68cfd74f233",
|
||
"metadata": {},
|
||
"source": [
|
||
"*Option 2: 🗂️ Use a .env file*"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b9297b6b-bd7e-457f-95af-5b41c7ab9b41",
|
||
"metadata": {},
|
||
"source": [
|
||
"Keep your key in a local .env file and load it automatically with python-dotenv"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "85a139dc-f439-4e4e-bc46-76d9478c304d",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from dotenv import load_dotenv\n",
|
||
"\n",
|
||
"load_dotenv() # reads .env and sets OPENAI_API_KEY"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1af364e3-df59-4963-aaaa-0e83f6ec5e32",
|
||
"metadata": {},
|
||
"source": [
|
||
"🎉🎉 That's it! You are good to go."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "3146180e-026e-4421-a490-ffd14ceabac3",
|
||
"metadata": {},
|
||
"source": [
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "93df377e",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Initialization"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "fb55dfe8-2c98-45b6-ba90-7a3667ceee0c",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Import required Packages and Classes\n",
|
||
"from langchain_zeusdb import ZeusDBVectorStore\n",
|
||
"from langchain_openai import OpenAIEmbeddings\n",
|
||
"from zeusdb import VectorDatabase"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Initialize embeddings\n",
|
||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
|
||
"\n",
|
||
"# Create ZeusDB index\n",
|
||
"vdb = VectorDatabase()\n",
|
||
"index = vdb.create(index_type=\"hnsw\", dim=1536, space=\"cosine\")\n",
|
||
"\n",
|
||
"# Create vector store\n",
|
||
"vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f45fa43c-8b54-4a75-b7b0-92ac0ac506c6",
|
||
"metadata": {},
|
||
"source": [
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ac6071d4",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Manage vector store"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "edf53787-ebda-4306-afc3-f7d440dcb1ff",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2.1 Add items to vector store"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "17f5efc0",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain_core.documents import Document\n",
|
||
"\n",
|
||
"document_1 = Document(\n",
|
||
" page_content=\"ZeusDB is a high-performance vector database\",\n",
|
||
" metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
|
||
")\n",
|
||
"\n",
|
||
"document_2 = Document(\n",
|
||
" page_content=\"Product Quantization reduces memory usage significantly\",\n",
|
||
" metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
|
||
")\n",
|
||
"\n",
|
||
"document_3 = Document(\n",
|
||
" page_content=\"ZeusDB integrates seamlessly with LangChain\",\n",
|
||
" metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
|
||
")\n",
|
||
"\n",
|
||
"documents = [document_1, document_2, document_3]\n",
|
||
"\n",
|
||
"vector_store.add_documents(documents=documents, ids=[\"1\", \"2\", \"3\"])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c738c3e0",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2.2 Update items in vector store"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f0aa8b71",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"updated_document = Document(\n",
|
||
" page_content=\"ZeusDB now supports advanced Product Quantization with 4x-256x compression\",\n",
|
||
" metadata={\"source\": \"https://docs.zeusdb.com\", \"updated\": True},\n",
|
||
")\n",
|
||
"\n",
|
||
"vector_store.add_documents([updated_document], ids=[\"1\"])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "dcf1b905",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2.3 Delete items from vector store"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ef61e188",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"vector_store.delete(ids=[\"3\"])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1a0091af-777d-4651-888a-3b346d7990f5",
|
||
"metadata": {},
|
||
"source": [
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c3620501",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Query vector store"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "4ba3fdb2-b7d6-4f0f-b8c9-91f63596018b",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 3.1 Query directly"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "400a9b25-9587-4116-ab59-6888602ec2b1",
|
||
"metadata": {},
|
||
"source": [
|
||
"Performing a simple similarity search:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "aa0a16fa",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"results = vector_store.similarity_search(query=\"high performance database\", k=2)\n",
|
||
"\n",
|
||
"for doc in results:\n",
|
||
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "3ed9d733",
|
||
"metadata": {},
|
||
"source": [
|
||
"If you want to execute a similarity search and receive the corresponding scores:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "5efd2eaa",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"results = vector_store.similarity_search_with_score(query=\"memory optimization\", k=2)\n",
|
||
"\n",
|
||
"for doc, score in results:\n",
|
||
" print(f\"* [SIM={score:.3f}] {doc.page_content} [{doc.metadata}]\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0c235cdc",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 3.2 Query by turning into retriever"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "59292cb5-5dc8-4158-9137-89d0f6ca711d",
|
||
"metadata": {},
|
||
"source": [
|
||
"You can also transform the vector store into a retriever for easier usage in your chains:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f3460093",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 2})\n",
|
||
"\n",
|
||
"retriever.invoke(\"vector database features\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "cc2d2b63-99d8-45c4-85e6-6a9409551ada",
|
||
"metadata": {},
|
||
"source": [
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "persistence_section",
|
||
"metadata": {},
|
||
"source": [
|
||
"## ZeusDB-Specific Features"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "memory_section",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 4.1 Memory-Efficient Setup with Product Quantization"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "12832d02-d9ea-4c35-a20f-05c85d1d7723",
|
||
"metadata": {},
|
||
"source": [
|
||
"For large datasets, use Product Quantization to reduce memory usage:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "quantization_example",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Create memory-optimized vector store\n",
|
||
"quantization_config = {\"type\": \"pq\", \"subvectors\": 8, \"bits\": 8, \"training_size\": 10000}\n",
|
||
"\n",
|
||
"vdb_quantized = VectorDatabase()\n",
|
||
"quantized_index = vdb_quantized.create(\n",
|
||
" index_type=\"hnsw\", dim=1536, quantization_config=quantization_config\n",
|
||
")\n",
|
||
"\n",
|
||
"quantized_vector_store = ZeusDBVectorStore(\n",
|
||
" zeusdb_index=quantized_index, embedding=embeddings\n",
|
||
")\n",
|
||
"\n",
|
||
"print(f\"Created quantized store: {quantized_index.info()}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "6ffe0613-b2a7-484e-9219-1166b65c49c5",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 4.2 Persistence"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "fbc323ee-4c6c-43fc-beba-675d820ca078",
|
||
"metadata": {},
|
||
"source": [
|
||
"Save and load your vector store to disk:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "834354d1-55ad-48fe-84e1-a5eacff3f6bb",
|
||
"metadata": {},
|
||
"source": [
|
||
"How to Save your vector store"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f9d1332b-a7ac-4a4b-a060-f2061599d3f1",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Save the vector store\n",
|
||
"vector_store.save_index(\"my_zeusdb_index.zdb\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "23370621-5b51-4313-800f-3a2fb9de52d2",
|
||
"metadata": {},
|
||
"source": [
|
||
"How to Load your vector store"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f9ed5778-58e4-4724-b69d-3c7b48cda429",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Load the vector store\n",
|
||
"loaded_store = ZeusDBVectorStore.load_index(\n",
|
||
" path=\"my_zeusdb_index.zdb\", embedding=embeddings\n",
|
||
")\n",
|
||
"\n",
|
||
"print(f\"Loaded store with {loaded_store.get_vector_count()} vectors\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "610cfe63-d4a8-4ef0-88a8-cf9cc3cbbfce",
|
||
"metadata": {},
|
||
"source": [
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "901c75dc",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Usage for retrieval-augmented generation\n",
|
||
"\n",
|
||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||
"\n",
|
||
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
|
||
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1d9d9d51-3798-410f-b1b3-f9736ea8c238",
|
||
"metadata": {},
|
||
"source": [
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "25b08eb0-99ab-4919-a201-5243fdfa39e9",
|
||
"metadata": {},
|
||
"source": [
|
||
"## API reference"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "77fdca8b-f75e-4100-9f1d-7a017567dc59",
|
||
"metadata": {},
|
||
"source": [
|
||
"For detailed documentation of all ZeusDBVectorStore features and configurations head to the Doc reference: https://docs.zeusdb.com/en/latest/vector_database/integrations/langchain.html"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.13.3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|