docs: add ZeusDB vector store integration (#32822)

## Description

This PR adds documentation for the new ZeusDB vector store integration
with LangChain.

## Motivation

ZeusDB is a high-performance vector database (Python/Rust backend)
designed for AI applications that need fast similarity search and
real-time vector ops. This integration brings ZeusDB's capabilities to
the LangChain ecosystem, giving developers another production-oriented
option for vector storage and retrieval.

**Key Features:**
- **User-Friendly Python API**: Intuitive interface that integrates
seamlessly with Python ML workflows
- **High Performance**: Powered by a robust Rust backend for
lightning-fast vector operations
- **Enterprise Logging**: Comprehensive logging capabilities for
monitoring and debugging production systems
- **Advanced Features**: Includes product quantization and persistence
capabilities
- **AI-Optimized**: Purpose-built for modern AI applications and RAG
pipelines

## Changes

- Added provider documentation:
`docs/docs/integrations/providers/zeusdb.mdx` (installation, setup).

- Added vector store documentation:
`docs/docs/integrations/vectorstores/zeusdb.ipynb` (quickstart for
creating/querying a ZeusDBVectorStore).

- Registered langchain-zeusdb in `libs/packages.yml` for discovery.

## Target users

- AI/ML engineers building RAG pipelines

- Data scientists working with large document collections

- Developers needing high-throughput vector search

- Teams requiring near real-time vector operations

## Testing

- Followed LangChain's "How to add standard tests to an integration"
guidance.
- Code passes format, lint, and test checks locally.
- Tested with LangChain Core 0.3.74
- Works with Python 3.10 to 3.13

## Package Information
**PyPI:** https://pypi.org/project/langchain-zeusdb
**Github:** https://github.com/ZeusDB/langchain-zeusdb
This commit is contained in:
doubleinfinity
2025-09-15 23:55:14 +10:00
committed by GitHub
parent 0be7515abc
commit b944bbc766
3 changed files with 1227 additions and 4 deletions

View File

@@ -0,0 +1,603 @@
<p align="center" width="100%">
<h1 align="center">LangChain ZeusDB Integration</h1>
</p>
A high-performance LangChain integration for ZeusDB, bringing enterprise-grade vector search capabilities to your LangChain applications.
## Features
🚀 **High Performance**
- Rust-powered vector database backend
- Advanced HNSW indexing for sub-millisecond search
- Product Quantization for 4x-256x memory compression
- Concurrent search with automatic parallelization
🎯 **LangChain Native**
- Full VectorStore API compliance
- Async/await support for all operations
- Seamless integration with LangChain retrievers
- Maximal Marginal Relevance (MMR) search
🏢 **Enterprise Ready**
- Structured logging with performance monitoring
- Index persistence with complete state preservation
- Advanced metadata filtering
- Graceful error handling and fallback mechanisms
## Quick Start
### Installation
```bash
pip install -qU langchain-zeusdb
```
### Getting Started
This example uses *OpenAIEmbeddings*, which requires an OpenAI API key - [Get your OpenAI API key here](https://platform.openai.com/api-keys)
If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.).
```bash
pip install langchain-openai
```
```python
import os
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
```
### Basic Usage
```python
from langchain_zeusdb import ZeusDBVectorStore
from langchain_openai import OpenAIEmbeddings
from zeusdb import VectorDatabase
# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create ZeusDB index
vdb = VectorDatabase()
index = vdb.create(
index_type="hnsw",
dim=1536,
space="cosine"
)
# Create vector store
vector_store = ZeusDBVectorStore(
zeusdb_index=index,
embedding=embeddings
)
# Add documents
from langchain_core.documents import Document
docs = [
Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
]
vector_store.add_documents(docs)
# Search
results = vector_store.similarity_search("fast database", k=2)
print(f"Found the following {len(results)} results:")
print(results)
```
**Expected results:**
```
Found the following 2 results:
[Document(id='ea2b4f13-b0b7-4cef-bb91-0fc4f4c41295', metadata={'source': 'docs'}, page_content='ZeusDB is fast'), Document(id='33dc1e87-a18a-4827-a0df-6ee47eabc7b2', metadata={'source': 'docs'}, page_content='LangChain is powerful')]
```
<br />
### Factory Methods
For convenience, you can create and populate a vector store in a single step:
**Example 1: - Create from texts (creates index and adds texts in one step)**
```python
vector_store_texts = ZeusDBVectorStore.from_texts(
texts=["Hello world", "Goodbye world"],
embedding=embeddings,
metadatas=[{"source": "text1"}, {"source": "text2"}]
)
print("texts store count:", vector_store_texts.get_vector_count()) # -> 2
print("texts store peek:", vector_store_texts.zeusdb_index.list(2)) # [('id1', {...}), ('id2', {...})]
# Search the texts-based store
results = vector_store_texts.similarity_search("Hello", k=1)
print(f"Found in texts store: {results[0].page_content}") # -> "Hello world"
```
**Expected results:**
```
texts store count: 2
texts store peek: [('e9c39b44-b610-4e00-91f3-bf652e9989ac', {'source': 'text1', 'text': 'Hello world'}), ('d33f210c-ed53-4006-a64a-a9eee397fec9', {'source': 'text2', 'text': 'Goodbye world'})]
Found in texts store: Hello world
```
<br />
**Example 2: - Create from documents (creates index and adds documents in one step)**
```python
new_docs = [
Document(page_content="Python is great", metadata={"source": "python"}),
Document(page_content="JavaScript is flexible", metadata={"source": "js"}),
]
vector_store_docs = ZeusDBVectorStore.from_documents(
documents=new_docs,
embedding=embeddings
)
print("docs store count:", vector_store_docs.get_vector_count()) # -> 2
print("docs store peek:", vector_store_docs.zeusdb_index.list(2)) # [('id3', {...}), ('id4', {...})]
# Search the documents-based store
results = vector_store_docs.similarity_search("Python", k=1)
print(f"Found in docs store: {results[0].page_content}") # -> "Python is great"
```
**Expected results:**
```
docs store count: 2
docs store peek: [('aab2d1c1-7e02-4817-8dd8-6fb03570bb6f', {'text': 'Python is great', 'source': 'python'}), ('9a8a82cb-0e70-456c-9db2-556e464de14e', {'text': 'JavaScript is flexible', 'source': 'js'})]
Found in docs store: Python is great
```
<br />
## Advanced Features
ZeusDB's enterprise-grade capabilities are fully integrated into the LangChain ecosystem, providing quantization, persistence, advanced search features and many other enterprise capabilities.
### Memory-Efficient Setup with Quantization
For large datasets, use Product Quantization to reduce memory usage:
```python
# Create quantized index for memory efficiency
quantization_config = {
'type': 'pq',
'subvectors': 8,
'bits': 8,
'training_size': 10000
}
vdb = VectorDatabase()
index = vdb.create(
index_type="hnsw",
dim=1536,
space="cosine",
quantization_config=quantization_config
)
vector_store = ZeusDBVectorStore(
zeusdb_index=index,
embedding=embeddings
)
```
Please refer to our [documentation](https://docs.zeusdb.com/en/latest/vector_database/product_quantization.html) for helpful configuration guidelines and recommendations for setting up quantization.
<br />
### Persistence
ZeusDB persistence lets you save a fully populated index to disk and load it later with complete state restoration. This includes vectors, metadata, HNSW graph, and (if enabled) Product Quantization models.
What gets saved:
- Vectors & IDs
- Metadata
- HNSW graph structure
- Quantization config, centroids, and training state (if PQ is enabled)
**How to Save your vector store**
```python
# Save index
vector_store.save_index("my_index.zdb")
```
**How to Load your vector store**
```python
# Load index
loaded_store = ZeusDBVectorStore.load_index(
path="my_index.zdb",
embedding=embeddings
)
# Verify after load
print("vector count:", loaded_store.get_vector_count())
print("index info:", loaded_store.info())
print("store peek:", loaded_store.zeusdb_index.list(2))
```
**Notes**
- The path is a directory, not a single file. Ensure the target is writable.
- Saved indexes are cross-platform and include format/version info for compatibility checks.
- If you used PQ, both the compression model and state are preserved—no need to retrain after loading.
- You can continue to use all vector store APIs (similarity_search, retrievers, etc.) on the loaded_store.
For further details (including file structure, and further comprehensive examples), see the [documentation](https://docs.zeusdb.com/en/latest/vector_database/persistence.html).
<br />
### Advanced Search Options
Use these to control scoring, diversity, metadata filtering, and retriever integration for your searches.
#### Similarity search with scores
Returns `(Document, raw_distance)` pairs from ZeusDB — **lower distance = more similar**.
If you prefer normalized relevance in `[0, 1]`, use `similarity_search_with_relevance_scores`.
```python
# Similarity search with scores
results_with_scores = vector_store.similarity_search_with_score(
query="machine learning",
k=5
)
print(results_with_scores)
```
**Expected results:**
```
[
(Document(id='ac0eaf5b-9f02-4ce2-8957-c369a7262c61', metadata={'source': 'docs'}, page_content='LangChain is powerful'), 0.8218843340873718),
(Document(id='faae3adf-7cf3-463c-b282-3790b096fa23', metadata={'source': 'docs'}, page_content='ZeusDB is fast'), 0.9140053391456604)
]
```
#### MMR search for diversity
MMR (Maximal Marginal Relevance) balances two forces: relevance to the query and diversity among selected results, reducing near-duplicate answers. Control the trade-off with lambda_mult (1.0 = all relevance, 0.0 = all diversity).
```python
# MMR search for diversity
mmr_results = vector_store.max_marginal_relevance_search(
query="AI applications",
k=5,
fetch_k=20,
lambda_mult=0.7 # Balance relevance vs diversity
)
print(mmr_results)
```
#### Search with metadata filtering
Filter results using document metadata you stored when adding docs
```python
# Search with metadata filtering
results = vector_store.similarity_search(
query="database performance",
k=3,
filter={"source": "documentation"}
)
```
For supported metadata query types and operators, please refer to the [documentation](https://docs.zeusdb.com/en/latest/vector_database/metadata_filtering.html).
#### As a Retriever
Turning the vector store into a retriever gives you a standard LangChain interface that chains (e.g., RetrievalQA) can call to fetch context. Under the hood it uses your chosen search type (similarity or mmr) and search_kwargs.
```python
# Convert to retriever for use in chains
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 3, "lambda_mult": 0.8}
)
# Use with LangChain Expression Language (LCEL) - requires only langchain-core
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
def format_docs(docs):
return "\n\n".join([d.page_content for d in docs])
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI()
# Create a chain using LCEL
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Use the chain
answer = chain.invoke("What is ZeusDB?")
print(answer)
```
**Expected results:**
```
ZeusDB is a fast database management system.
```
<br />
## Async Support
ZeusDB supports asynchronous operations for non-blocking, concurrent vector operations.
**When to use async:** web servers (FastAPI/Starlette), agents/pipelines doing parallel searches, or notebooks where you want non-blocking/concurrent retrieval. If you're writing simple scripts, the sync methods are fine.
Those are **asynchronous operations** - the async/await versions of the regular synchronous methods. Here's what each one does:
1. `await vector_store.aadd_documents(documents)` - Asynchronously adds documents to the vector store (async version of `add_documents()`)
2. `await vector_store.asimilarity_search("query", k=5)` - Asynchronously performs similarity search (async version of `similarity_search()`)
3. `await vector_store.adelete(ids=["doc1", "doc2"])` - Asynchronously deletes documents by their IDs (async version of `delete()`)
The async versions are useful when:
- You're building async applications (using `asyncio`, FastAPI, etc.)
- You want non-blocking operations that can run concurrently
- You're handling multiple requests simultaneously
- You want better performance in I/O-bound applications
For example, instead of blocking while adding documents:
```python
# Synchronous (blocking)
vector_store.add_documents(docs) # Blocks until complete
# Asynchronous (non-blocking)
await vector_store.aadd_documents(docs) # Can do other work while this runs
```
All operations support async/await:
**Script version (`python my_script.py`):**
```python
import asyncio
from langchain_zeusdb import ZeusDBVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from zeusdb import VectorDatabase
# Setup
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vdb = VectorDatabase()
index = vdb.create(index_type="hnsw", dim=1536, space="cosine")
vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)
docs = [
Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
]
async def main():
# Add documents asynchronously
ids = await vector_store.aadd_documents(docs)
print("Added IDs:", ids)
# Run multiple searches concurrently
results_fast, results_powerful = await asyncio.gather(
vector_store.asimilarity_search("fast", k=2),
vector_store.asimilarity_search("powerful", k=2),
)
print("Fast results:", [d.page_content for d in results_fast])
print("Powerful results:", [d.page_content for d in results_powerful])
# Delete documents asynchronously
deleted = await vector_store.adelete(ids=ids[:1])
print("Deleted first doc:", deleted)
if __name__ == "__main__":
asyncio.run(main())
```
**Colab/Notebook/Jupyter version (top-level `await`):**
```python
from langchain_zeusdb import ZeusDBVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from zeusdb import VectorDatabase
import asyncio
# Setup
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vdb = VectorDatabase()
index = vdb.create(index_type="hnsw", dim=1536, space="cosine")
vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)
docs = [
Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
]
# Add documents asynchronously
ids = await vector_store.aadd_documents(docs)
print("Added IDs:", ids)
# Run multiple searches concurrently
results_fast, results_powerful = await asyncio.gather(
vector_store.asimilarity_search("fast", k=2),
vector_store.asimilarity_search("powerful", k=2),
)
print("Fast results:", [d.page_content for d in results_fast])
print("Powerful results:", [d.page_content for d in results_powerful])
# Delete documents asynchronously
deleted = await vector_store.adelete(ids=ids[:1])
print("Deleted first doc:", deleted)
```
**Expected results:**
```
Added IDs: ['9c440918-715f-49ba-9b97-0d991d29e997', 'ad59c645-d3ba-4a4a-a016-49ed39514123']
Fast results: ['ZeusDB is fast', 'LangChain is powerful']
Powerful results: ['LangChain is powerful', 'ZeusDB is fast']
Deleted first doc: True
```
<br />
## Monitoring and Observability
### Performance Monitoring
```python
# Get index statistics
stats = vector_store.get_zeusdb_stats()
print(f"Index size: {stats.get('total_vectors', '0')} vectors")
print(f"Dimension: {stats.get('dimension')} | Space: {stats.get('space')} | Index type: {stats.get('index_type')}")
# Benchmark search performance
performance = vector_store.benchmark_search_performance(
query_count=100,
max_threads=4
)
print(f"Search QPS: {performance.get('parallel_qps', 0):.0f}")
# Check quantization status
if vector_store.is_quantized():
progress = vector_store.get_training_progress()
print(f"Quantization training: {progress:.1f}% complete")
else:
print("Index is not quantized")
```
**Expected results:**
```
Index size: 2 vectors
Dimension: 1536 | Space: cosine | Index type: HNSW
Search QPS: 53807
Index is not quantized
```
### Enterprise Logging
ZeusDB includes enterprise-grade structured logging that works automatically with smart environment detection:
```python
import logging
# ZeusDB automatically detects your environment and applies appropriate logging:
# - Development: Human-readable logs, WARNING level
# - Production: JSON structured logs, ERROR level
# - Testing: Minimal output, CRITICAL level
# - Jupyter: Clean readable logs, INFO level
# Operations are automatically logged with performance metrics
vector_store.add_documents(docs)
# Logs: {"operation":"vector_addition","total_inserted":2,"duration_ms":45}
# Control logging with environment variables if needed
# ZEUSDB_LOG_LEVEL=debug ZEUSDB_LOG_FORMAT=json python your_app.py
```
To learn more about the full features of ZeusDB's enterprise logging capabilities please read the following [documentation](https://docs.zeusdb.com/en/latest/vector_database/logging.html).
<br />
## Configuration Options
### Index Parameters
```python
vdb = VectorDatabase()
index = vdb.create(
index_type="hnsw", # Index algorithm
dim=1536, # Vector dimension
space="cosine", # Distance metric: cosine, l2, l1
m=16, # HNSW connectivity
ef_construction=200, # Build-time search width
expected_size=100000, # Expected number of vectors
quantization_config=None # Optional quantization
)
```
### Search Parameters
```python
results = vector_store.similarity_search(
query="search query",
k=5, # Number of results
ef_search=None, # Runtime search width (auto if None)
filter={"key": "value"} # Metadata filter
)
```
## Error Handling
The integration includes comprehensive error handling:
```python
try:
results = vector_store.similarity_search("query")
print(results)
except Exception as e:
# Graceful degradation with logging
print(f"Search failed: {e}")
# Fallback logic here
```
## Requirements
- **Python**: 3.10 or higher
- **ZeusDB**: 0.0.8 or higher
- **LangChain Core**: 0.3.74 or higher
## Installation from Source
```bash
git clone https://github.com/zeusdb/langchain-zeusdb.git
cd langchain-zeusdb/libs/zeusdb
pip install -e .
```
## Use Cases
- **RAG Applications**: High-performance retrieval for question answering
- **Semantic Search**: Fast similarity search across large document collections
- **Recommendation Systems**: Vector-based content and collaborative filtering
- **Embeddings Analytics**: Analysis of high-dimensional embedding spaces
- **Real-time Applications**: Low-latency vector search for production systems
## Compatibility
### LangChain Versions
- **LangChain Core**: 0.3.74+
### Distance Metrics
- **Cosine**: Default, normalized similarity
- **Euclidean (L2)**: Geometric distance
- **Manhattan (L1)**: City-block distance
### Embedding Models
Compatible with any embedding provider:
- OpenAI (`text-embedding-3-small`, `text-embedding-3-large`)
- Hugging Face Transformers
- Cohere Embeddings
- Custom embedding functions
## Support
- **Documentation**: [docs.zeusdb.com](https://docs.zeusdb.com)
- **Issues**: [GitHub Issues](https://github.com/zeusdb/langchain-zeusdb/issues)
- **Email**: contact@zeusdb.com
---
*Making vector search fast, scalable, and developer-friendly.*

View File

@@ -0,0 +1,617 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ef1f0986",
"metadata": {},
"source": [
"# ⚡ ZeusDB Vector Store\n",
"\n",
"ZeusDB is a high-performance, Rust-powered vector database with enterprise features like quantization, persistence and logging.\n",
"\n",
"This notebook covers how to get started with the ZeusDB Vector Store to efficiently use ZeusDB with LangChain."
]
},
{
"cell_type": "markdown",
"id": "107c485d-13a3-4309-9fda-5a0440862d3c",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "36fdc060",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"id": "d978e3fd-d130-436f-841d-d133c0fae8fb",
"metadata": {},
"source": [
"Install the ZeusDB LangChain integration package from PyPi:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42ca8320-b866-4f37-944e-96eda54231d2",
"metadata": {},
"outputs": [],
"source": [
"pip install -qU langchain-zeusdb"
]
},
{
"cell_type": "markdown",
"id": "2a0e518a-ae8a-464b-8b47-9deb9d4ab063",
"metadata": {},
"source": [
"*Setup in Jupyter Notebooks*"
]
},
{
"cell_type": "markdown",
"id": "1d092ea6-8553-4686-9563-b8318225a04a",
"metadata": {},
"source": [
"> 💡 Tip: If youre working inside Jupyter or Google Colab, use the %pip magic command so the package is installed into the active kernel:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64e28aa6",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-zeusdb"
]
},
{
"cell_type": "markdown",
"id": "c12fe175-a299-47d3-869f-9367b6aa572d",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "31554e69-40b2-4201-9f92-57e73ac66d33",
"metadata": {},
"source": [
"## Getting Started"
]
},
{
"cell_type": "markdown",
"id": "b696b3dd-0fed-4ed2-a79a-5b32598508c0",
"metadata": {},
"source": [
"This example uses OpenAIEmbeddings, which requires an OpenAI API key [Get your OpenAI API key here](https://platform.openai.com/api-keys)"
]
},
{
"cell_type": "markdown",
"id": "2b79766e-7725-4be0-a183-4947b56892c5",
"metadata": {},
"source": [
"If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.)."
]
},
{
"cell_type": "markdown",
"id": "b5266cc7-28da-459e-a28d-128382ed5a20",
"metadata": {},
"source": [
"Install the LangChain OpenAI integration package from PyPi:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ed941cd-5e06-4c61-9235-90bd0b0b0452",
"metadata": {},
"outputs": [],
"source": [
"pip install -qU langchain-openai\n",
"\n",
"# Use this command if inside Jupyter Notebooks\n",
"#%pip install -qU langchain-openai"
]
},
{
"cell_type": "markdown",
"id": "0f49b2ec-d047-455d-8c05-da041112dd8a",
"metadata": {},
"source": [
"#### Please choose an option below for your OpenAI key integration"
]
},
{
"cell_type": "markdown",
"id": "ed2d9bf6-be53-4fc1-9611-158f03fd71b7",
"metadata": {},
"source": [
"*Option 1: 🔑 Enter your API key each time* "
]
},
{
"cell_type": "markdown",
"id": "eff5b6a5-4c57-4531-896e-54bcb2b1dec2",
"metadata": {},
"source": [
"Use getpass in Jupyter to securely input your key for the current session:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08a50da9-5ed1-40dc-a390-07b031369761",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "markdown",
"id": "7321917e-8586-42e4-9822-b68cfd74f233",
"metadata": {},
"source": [
"*Option 2: 🗂️ Use a .env file*"
]
},
{
"cell_type": "markdown",
"id": "b9297b6b-bd7e-457f-95af-5b41c7ab9b41",
"metadata": {},
"source": [
"Keep your key in a local .env file and load it automatically with python-dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85a139dc-f439-4e4e-bc46-76d9478c304d",
"metadata": {},
"outputs": [],
"source": [
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv() # reads .env and sets OPENAI_API_KEY"
]
},
{
"cell_type": "markdown",
"id": "1af364e3-df59-4963-aaaa-0e83f6ec5e32",
"metadata": {},
"source": [
"🎉🎉 That's it! You are good to go."
]
},
{
"cell_type": "markdown",
"id": "3146180e-026e-4421-a490-ffd14ceabac3",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "93df377e",
"metadata": {},
"source": [
"## Initialization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb55dfe8-2c98-45b6-ba90-7a3667ceee0c",
"metadata": {},
"outputs": [],
"source": [
"# Import required Packages and Classes\n",
"from langchain_zeusdb import ZeusDBVectorStore\n",
"from langchain_openai import OpenAIEmbeddings\n",
"from zeusdb import VectorDatabase"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Initialize embeddings\n",
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
"\n",
"# Create ZeusDB index\n",
"vdb = VectorDatabase()\n",
"index = vdb.create(index_type=\"hnsw\", dim=1536, space=\"cosine\")\n",
"\n",
"# Create vector store\n",
"vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)"
]
},
{
"cell_type": "markdown",
"id": "f45fa43c-8b54-4a75-b7b0-92ac0ac506c6",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "ac6071d4",
"metadata": {},
"source": [
"## Manage vector store"
]
},
{
"cell_type": "markdown",
"id": "edf53787-ebda-4306-afc3-f7d440dcb1ff",
"metadata": {},
"source": [
"### 2.1 Add items to vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17f5efc0",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"\n",
"document_1 = Document(\n",
" page_content=\"ZeusDB is a high-performance vector database\",\n",
" metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
")\n",
"\n",
"document_2 = Document(\n",
" page_content=\"Product Quantization reduces memory usage significantly\",\n",
" metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
")\n",
"\n",
"document_3 = Document(\n",
" page_content=\"ZeusDB integrates seamlessly with LangChain\",\n",
" metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
")\n",
"\n",
"documents = [document_1, document_2, document_3]\n",
"\n",
"vector_store.add_documents(documents=documents, ids=[\"1\", \"2\", \"3\"])"
]
},
{
"cell_type": "markdown",
"id": "c738c3e0",
"metadata": {},
"source": [
"### 2.2 Update items in vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0aa8b71",
"metadata": {},
"outputs": [],
"source": [
"updated_document = Document(\n",
" page_content=\"ZeusDB now supports advanced Product Quantization with 4x-256x compression\",\n",
" metadata={\"source\": \"https://docs.zeusdb.com\", \"updated\": True},\n",
")\n",
"\n",
"vector_store.add_documents([updated_document], ids=[\"1\"])"
]
},
{
"cell_type": "markdown",
"id": "dcf1b905",
"metadata": {},
"source": [
"### 2.3 Delete items from vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef61e188",
"metadata": {},
"outputs": [],
"source": [
"vector_store.delete(ids=[\"3\"])"
]
},
{
"cell_type": "markdown",
"id": "1a0091af-777d-4651-888a-3b346d7990f5",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "c3620501",
"metadata": {},
"source": [
"## Query vector store"
]
},
{
"cell_type": "markdown",
"id": "4ba3fdb2-b7d6-4f0f-b8c9-91f63596018b",
"metadata": {},
"source": [
"### 3.1 Query directly"
]
},
{
"cell_type": "markdown",
"id": "400a9b25-9587-4116-ab59-6888602ec2b1",
"metadata": {},
"source": [
"Performing a simple similarity search:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa0a16fa",
"metadata": {},
"outputs": [],
"source": [
"results = vector_store.similarity_search(query=\"high performance database\", k=2)\n",
"\n",
"for doc in results:\n",
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "3ed9d733",
"metadata": {},
"source": [
"If you want to execute a similarity search and receive the corresponding scores:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5efd2eaa",
"metadata": {},
"outputs": [],
"source": [
"results = vector_store.similarity_search_with_score(query=\"memory optimization\", k=2)\n",
"\n",
"for doc, score in results:\n",
" print(f\"* [SIM={score:.3f}] {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "0c235cdc",
"metadata": {},
"source": [
"### 3.2 Query by turning into retriever"
]
},
{
"cell_type": "markdown",
"id": "59292cb5-5dc8-4158-9137-89d0f6ca711d",
"metadata": {},
"source": [
"You can also transform the vector store into a retriever for easier usage in your chains:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3460093",
"metadata": {},
"outputs": [],
"source": [
"retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 2})\n",
"\n",
"retriever.invoke(\"vector database features\")"
]
},
{
"cell_type": "markdown",
"id": "cc2d2b63-99d8-45c4-85e6-6a9409551ada",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "persistence_section",
"metadata": {},
"source": [
"## ZeusDB-Specific Features"
]
},
{
"cell_type": "markdown",
"id": "memory_section",
"metadata": {},
"source": [
"### 4.1 Memory-Efficient Setup with Product Quantization"
]
},
{
"cell_type": "markdown",
"id": "12832d02-d9ea-4c35-a20f-05c85d1d7723",
"metadata": {},
"source": [
"For large datasets, use Product Quantization to reduce memory usage:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "quantization_example",
"metadata": {},
"outputs": [],
"source": [
"# Create memory-optimized vector store\n",
"quantization_config = {\"type\": \"pq\", \"subvectors\": 8, \"bits\": 8, \"training_size\": 10000}\n",
"\n",
"vdb_quantized = VectorDatabase()\n",
"quantized_index = vdb_quantized.create(\n",
" index_type=\"hnsw\", dim=1536, quantization_config=quantization_config\n",
")\n",
"\n",
"quantized_vector_store = ZeusDBVectorStore(\n",
" zeusdb_index=quantized_index, embedding=embeddings\n",
")\n",
"\n",
"print(f\"Created quantized store: {quantized_index.info()}\")"
]
},
{
"cell_type": "markdown",
"id": "6ffe0613-b2a7-484e-9219-1166b65c49c5",
"metadata": {},
"source": [
"### 4.2 Persistence"
]
},
{
"cell_type": "markdown",
"id": "fbc323ee-4c6c-43fc-beba-675d820ca078",
"metadata": {},
"source": [
"Save and load your vector store to disk:"
]
},
{
"cell_type": "markdown",
"id": "834354d1-55ad-48fe-84e1-a5eacff3f6bb",
"metadata": {},
"source": [
"How to Save your vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f9d1332b-a7ac-4a4b-a060-f2061599d3f1",
"metadata": {},
"outputs": [],
"source": [
"# Save the vector store\n",
"vector_store.save_index(\"my_zeusdb_index.zdb\")"
]
},
{
"cell_type": "markdown",
"id": "23370621-5b51-4313-800f-3a2fb9de52d2",
"metadata": {},
"source": [
"How to Load your vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f9ed5778-58e4-4724-b69d-3c7b48cda429",
"metadata": {},
"outputs": [],
"source": [
"# Load the vector store\n",
"loaded_store = ZeusDBVectorStore.load_index(\n",
" path=\"my_zeusdb_index.zdb\", embedding=embeddings\n",
")\n",
"\n",
"print(f\"Loaded store with {loaded_store.get_vector_count()} vectors\")"
]
},
{
"cell_type": "markdown",
"id": "610cfe63-d4a8-4ef0-88a8-cf9cc3cbbfce",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "901c75dc",
"metadata": {},
"source": [
"## Usage for retrieval-augmented generation\n",
"\n",
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
"\n",
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)"
]
},
{
"cell_type": "markdown",
"id": "1d9d9d51-3798-410f-b1b3-f9736ea8c238",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "25b08eb0-99ab-4919-a201-5243fdfa39e9",
"metadata": {},
"source": [
"## API reference"
]
},
{
"cell_type": "markdown",
"id": "77fdca8b-f75e-4100-9f1d-7a017567dc59",
"metadata": {},
"source": [
"For detailed documentation of all ZeusDBVectorStore features and configurations head to the Doc reference: https://docs.zeusdb.com/en/latest/vector_database/integrations/langchain.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -735,6 +735,10 @@ packages:
- name: langchain-google-bigtable - name: langchain-google-bigtable
name_title: Bigtable name_title: Bigtable
repo: googleapis/langchain-google-bigtable-python repo: googleapis/langchain-google-bigtable-python
- name: langchain-oci
name_title: Oracle Cloud Infrastructure (OCI)
repo: oracle/langchain-oracle
path: .
- name: langchain-timbr - name: langchain-timbr
provider_page: timbr provider_page: timbr
path: . path: .
@@ -742,7 +746,6 @@ packages:
- name: langchain-zenrows - name: langchain-zenrows
path: . path: .
repo: ZenRows-Hub/langchain-zenrows repo: ZenRows-Hub/langchain-zenrows
- name: langchain-oci - name: langchain-zeusdb
name_title: Oracle Cloud Infrastructure (OCI) repo: zeusdb/langchain-zeusdb
repo: oracle/langchain-oracle path: libs/zeusdb
path: .