docs: add ZeusDB vector store integration (#32822)

## Description This PR adds documentation for the new ZeusDB vector store integration with LangChain. ## Motivation ZeusDB is a high-performance vector database (Python/Rust backend) designed for AI applications that need fast similarity search and real-time vector ops. This integration brings ZeusDB's capabilities to the LangChain ecosystem, giving developers another production-oriented option for vector storage and retrieval. **Key Features:** - **User-Friendly Python API**: Intuitive interface that integrates seamlessly with Python ML workflows - **High Performance**: Powered by a robust Rust backend for lightning-fast vector operations - **Enterprise Logging**: Comprehensive logging capabilities for monitoring and debugging production systems - **Advanced Features**: Includes product quantization and persistence capabilities - **AI-Optimized**: Purpose-built for modern AI applications and RAG pipelines ## Changes - Added provider documentation: `docs/docs/integrations/providers/zeusdb.mdx` (installation, setup). - Added vector store documentation: `docs/docs/integrations/vectorstores/zeusdb.ipynb` (quickstart for creating/querying a ZeusDBVectorStore). - Registered langchain-zeusdb in `libs/packages.yml` for discovery. ## Target users - AI/ML engineers building RAG pipelines - Data scientists working with large document collections - Developers needing high-throughput vector search - Teams requiring near real-time vector operations ## Testing - Followed LangChain's "How to add standard tests to an integration" guidance. - Code passes format, lint, and test checks locally. - Tested with LangChain Core 0.3.74 - Works with Python 3.10 to 3.13 ## Package Information **PyPI:** https://pypi.org/project/langchain-zeusdb **Github:** https://github.com/ZeusDB/langchain-zeusdb
2025-09-20 10:03:16 +00:00 · 2025-09-15 23:55:14 +10:00
parent 0be7515abc
commit b944bbc766
3 changed files with 1227 additions and 4 deletions
--- a/docs/docs/integrations/providers/zeusdb.mdx
+++ b/docs/docs/integrations/providers/zeusdb.mdx
@@ -0,0 +1,603 @@
 <p align="center" width="100%">
  <h1 align="center">LangChain ZeusDB Integration</h1>
 </p>
 A high-performance LangChain integration for ZeusDB, bringing enterprise-grade vector search capabilities to your LangChain applications.
 ## Features
 🚀 **High Performance**
 - Rust-powered vector database backend
 - Advanced HNSW indexing for sub-millisecond search
 - Product Quantization for 4x-256x memory compression
 - Concurrent search with automatic parallelization
 🎯 **LangChain Native**
 - Full VectorStore API compliance
 - Async/await support for all operations
 - Seamless integration with LangChain retrievers
 - Maximal Marginal Relevance (MMR) search
 🏢 **Enterprise Ready**
 - Structured logging with performance monitoring
 - Index persistence with complete state preservation
 - Advanced metadata filtering
 - Graceful error handling and fallback mechanisms
 ## Quick Start
 ### Installation
 ```bash
 pip install -qU langchain-zeusdb
 ```
 ### Getting Started
 This example uses *OpenAIEmbeddings*, which requires an OpenAI API key - [Get your OpenAI API key here](https://platform.openai.com/api-keys)
 If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.).
 ```bash
 pip install langchain-openai
 ```
 ```python
 import os
 import getpass
 os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
 ```
 ### Basic Usage
 ```python
 from langchain_zeusdb import ZeusDBVectorStore
 from langchain_openai import OpenAIEmbeddings
 from zeusdb import VectorDatabase
 # Initialize embeddings
 embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
 # Create ZeusDB index
 vdb = VectorDatabase()
 index = vdb.create(
    index_type="hnsw",
    dim=1536,
    space="cosine"
 )
 # Create vector store
 vector_store = ZeusDBVectorStore(
    zeusdb_index=index,
    embedding=embeddings
 )
 # Add documents
 from langchain_core.documents import Document
 docs = [
    Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
    Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
 ]
 vector_store.add_documents(docs)
 # Search
 results = vector_store.similarity_search("fast database", k=2)
 print(f"Found the following {len(results)} results:")
 print(results)
 ```
 **Expected results:**
 ```
 Found the following 2 results:
 [Document(id='ea2b4f13-b0b7-4cef-bb91-0fc4f4c41295', metadata={'source': 'docs'}, page_content='ZeusDB is fast'), Document(id='33dc1e87-a18a-4827-a0df-6ee47eabc7b2', metadata={'source': 'docs'}, page_content='LangChain is powerful')]
 ```
 <br />
 ### Factory Methods
 For convenience, you can create and populate a vector store in a single step:
 **Example 1: - Create from texts (creates index and adds texts in one step)**
 ```python
 vector_store_texts = ZeusDBVectorStore.from_texts(
    texts=["Hello world", "Goodbye world"],
    embedding=embeddings,
    metadatas=[{"source": "text1"}, {"source": "text2"}]
 )
 print("texts store count:", vector_store_texts.get_vector_count())         # -> 2
 print("texts store peek:", vector_store_texts.zeusdb_index.list(2))        # [('id1', {...}), ('id2', {...})]
 # Search the texts-based store
 results = vector_store_texts.similarity_search("Hello", k=1)
 print(f"Found in texts store: {results[0].page_content}")                  # -> "Hello world"
 ```
 **Expected results:**
 ```
 texts store count: 2
 texts store peek: [('e9c39b44-b610-4e00-91f3-bf652e9989ac', {'source': 'text1', 'text': 'Hello world'}), ('d33f210c-ed53-4006-a64a-a9eee397fec9', {'source': 'text2', 'text': 'Goodbye world'})]
 Found in texts store: Hello world
 ```
 <br />
 **Example 2: - Create from documents (creates index and adds documents in one step)**
 ```python
 new_docs = [
    Document(page_content="Python is great", metadata={"source": "python"}),
    Document(page_content="JavaScript is flexible", metadata={"source": "js"}),
 ]
 vector_store_docs = ZeusDBVectorStore.from_documents(
    documents=new_docs,
    embedding=embeddings
 )
 print("docs store count:", vector_store_docs.get_vector_count())           # -> 2
 print("docs store peek:", vector_store_docs.zeusdb_index.list(2))          # [('id3', {...}), ('id4', {...})]
 # Search the documents-based store
 results = vector_store_docs.similarity_search("Python", k=1)
 print(f"Found in docs store: {results[0].page_content}")                   # -> "Python is great"
 ```
 **Expected results:**
 ```
 docs store count: 2
 docs store peek: [('aab2d1c1-7e02-4817-8dd8-6fb03570bb6f', {'text': 'Python is great', 'source': 'python'}), ('9a8a82cb-0e70-456c-9db2-556e464de14e', {'text': 'JavaScript is flexible', 'source': 'js'})]
 Found in docs store: Python is great
 ```
 <br />
 ## Advanced Features
 ZeusDB's enterprise-grade capabilities are fully integrated into the LangChain ecosystem, providing quantization, persistence, advanced search features and many other enterprise capabilities.
 ### Memory-Efficient Setup with Quantization
 For large datasets, use Product Quantization to reduce memory usage:
 ```python
 # Create quantized index for memory efficiency
 quantization_config = {
    'type': 'pq',
    'subvectors': 8,
    'bits': 8,
    'training_size': 10000
 }
 vdb = VectorDatabase()
 index = vdb.create(
    index_type="hnsw",
    dim=1536,
    space="cosine",
    quantization_config=quantization_config
 )
 vector_store = ZeusDBVectorStore(
    zeusdb_index=index,
    embedding=embeddings
 )
 ```
 Please refer to our [documentation](https://docs.zeusdb.com/en/latest/vector_database/product_quantization.html) for helpful configuration guidelines and recommendations for setting up quantization.
 <br />
 ### Persistence
 ZeusDB persistence lets you save a fully populated index to disk and load it later with complete state restoration. This includes vectors, metadata, HNSW graph, and (if enabled) Product Quantization models.
 What gets saved:
 - Vectors & IDs
 -  Metadata
 - HNSW graph structure
 - Quantization config, centroids, and training state (if PQ is enabled)
 **How to Save your vector store**
 ```python
 # Save index
 vector_store.save_index("my_index.zdb")
 ```
 **How to Load your vector store**
 ```python
 # Load index
 loaded_store = ZeusDBVectorStore.load_index(
    path="my_index.zdb",
    embedding=embeddings
 )
 # Verify after load
 print("vector count:", loaded_store.get_vector_count())
 print("index info:", loaded_store.info())
 print("store peek:", loaded_store.zeusdb_index.list(2))
 ```
 **Notes**
 - The path is a directory, not a single file. Ensure the target is writable.
 - Saved indexes are cross-platform and include format/version info for compatibility checks.
 - If you used PQ, both the compression model and state are preserved—no need to retrain after loading.
 - You can continue to use all vector store APIs (similarity_search, retrievers, etc.) on the loaded_store.
 For further details (including file structure, and further comprehensive examples), see the [documentation](https://docs.zeusdb.com/en/latest/vector_database/persistence.html).
 <br />
 ### Advanced Search Options
 Use these to control scoring, diversity, metadata filtering, and retriever integration for your searches.
 #### Similarity search with scores
 Returns `(Document, raw_distance)` pairs from ZeusDB — **lower distance = more similar**.
 If you prefer normalized relevance in `[0, 1]`, use `similarity_search_with_relevance_scores`.
 ```python
 # Similarity search with scores
 results_with_scores = vector_store.similarity_search_with_score(
    query="machine learning",
    k=5
 )
 print(results_with_scores)
 ```
 **Expected results:**
 ```
 [
  (Document(id='ac0eaf5b-9f02-4ce2-8957-c369a7262c61', metadata={'source': 'docs'}, page_content='LangChain is powerful'), 0.8218843340873718),
  (Document(id='faae3adf-7cf3-463c-b282-3790b096fa23', metadata={'source': 'docs'}, page_content='ZeusDB is fast'), 0.9140053391456604)
 ]
 ```
 #### MMR search for diversity
 MMR (Maximal Marginal Relevance) balances two forces: relevance to the query and diversity among selected results, reducing near-duplicate answers. Control the trade-off with lambda_mult (1.0 = all relevance, 0.0 = all diversity).
 ```python
 # MMR search for diversity
 mmr_results = vector_store.max_marginal_relevance_search(
    query="AI applications",
    k=5,
    fetch_k=20,
    lambda_mult=0.7  # Balance relevance vs diversity
 )
 print(mmr_results)
 ```
 #### Search with metadata filtering
 Filter results using document metadata you stored when adding docs
 ```python
 # Search with metadata filtering
 results = vector_store.similarity_search(
    query="database performance",
    k=3,
    filter={"source": "documentation"}
 )
 ```
 For supported metadata query types and operators, please refer to the [documentation](https://docs.zeusdb.com/en/latest/vector_database/metadata_filtering.html).
 #### As a Retriever
 Turning the vector store into a retriever gives you a standard LangChain interface that chains (e.g., RetrievalQA) can call to fetch context. Under the hood it uses your chosen search type (similarity or mmr) and search_kwargs.
 ```python
 # Convert to retriever for use in chains
 retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 3, "lambda_mult": 0.8}
 )
 # Use with LangChain Expression Language (LCEL) - requires only langchain-core
 from langchain_core.prompts import ChatPromptTemplate
 from langchain_core.output_parsers import StrOutputParser
 from langchain_core.runnables import RunnablePassthrough
 from langchain_openai import ChatOpenAI
 def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])
 template = """Answer the question based only on the following context:
 {context}
 Question: {question}
 """
 prompt = ChatPromptTemplate.from_template(template)
 llm = ChatOpenAI()
 # Create a chain using LCEL
 chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
 )
 # Use the chain
 answer = chain.invoke("What is ZeusDB?")
 print(answer)
 ```
 **Expected results:**
 ```
 ZeusDB is a fast database management system.
 ```
 <br />
 ## Async Support
 ZeusDB supports asynchronous operations for non-blocking, concurrent vector operations.
 **When to use async:** web servers (FastAPI/Starlette), agents/pipelines doing parallel searches, or notebooks where you want non-blocking/concurrent retrieval. If you're writing simple scripts, the sync methods are fine.
 Those are **asynchronous operations** - the async/await versions of the regular synchronous methods. Here's what each one does:
 1. `await vector_store.aadd_documents(documents)` - Asynchronously adds documents to the vector store (async version of `add_documents()`)
 2. `await vector_store.asimilarity_search("query", k=5)` - Asynchronously performs similarity search (async version of `similarity_search()`)
 3. `await vector_store.adelete(ids=["doc1", "doc2"])` - Asynchronously deletes documents by their IDs (async version of `delete()`)
 The async versions are useful when:
 - You're building async applications (using `asyncio`, FastAPI, etc.)
 - You want non-blocking operations that can run concurrently
 - You're handling multiple requests simultaneously
 - You want better performance in I/O-bound applications
 For example, instead of blocking while adding documents:
 ```python
 # Synchronous (blocking)
 vector_store.add_documents(docs)  # Blocks until complete
 # Asynchronous (non-blocking)
 await vector_store.aadd_documents(docs)  # Can do other work while this runs
 ```
 All operations support async/await:
 **Script version (`python my_script.py`):**
 ```python
 import asyncio
 from langchain_zeusdb import ZeusDBVectorStore
 from langchain_openai import OpenAIEmbeddings
 from langchain_core.documents import Document
 from zeusdb import VectorDatabase
 # Setup
 embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
 vdb = VectorDatabase()
 index = vdb.create(index_type="hnsw", dim=1536, space="cosine")
 vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)
 docs = [
    Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
    Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
 ]
 async def main():
    # Add documents asynchronously
    ids = await vector_store.aadd_documents(docs)
    print("Added IDs:", ids)
    # Run multiple searches concurrently
    results_fast, results_powerful = await asyncio.gather(
        vector_store.asimilarity_search("fast", k=2),
        vector_store.asimilarity_search("powerful", k=2),
    )
    print("Fast results:", [d.page_content for d in results_fast])
    print("Powerful results:", [d.page_content for d in results_powerful])
    # Delete documents asynchronously
    deleted = await vector_store.adelete(ids=ids[:1])
    print("Deleted first doc:", deleted)
 if __name__ == "__main__":
    asyncio.run(main())
 ```
 **Colab/Notebook/Jupyter version (top-level `await`):**
 ```python
 from langchain_zeusdb import ZeusDBVectorStore
 from langchain_openai import OpenAIEmbeddings
 from langchain_core.documents import Document
 from zeusdb import VectorDatabase
 import asyncio
 # Setup
 embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
 vdb = VectorDatabase()
 index = vdb.create(index_type="hnsw", dim=1536, space="cosine")
 vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)
 docs = [
    Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
    Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
 ]
 # Add documents asynchronously
 ids = await vector_store.aadd_documents(docs)
 print("Added IDs:", ids)
 # Run multiple searches concurrently
 results_fast, results_powerful = await asyncio.gather(
    vector_store.asimilarity_search("fast", k=2),
    vector_store.asimilarity_search("powerful", k=2),
 )
 print("Fast results:", [d.page_content for d in results_fast])
 print("Powerful results:", [d.page_content for d in results_powerful])
 # Delete documents asynchronously
 deleted = await vector_store.adelete(ids=ids[:1])
 print("Deleted first doc:", deleted)
 ```
 **Expected results:**
 ```
 Added IDs: ['9c440918-715f-49ba-9b97-0d991d29e997', 'ad59c645-d3ba-4a4a-a016-49ed39514123']
 Fast results: ['ZeusDB is fast', 'LangChain is powerful']
 Powerful results: ['LangChain is powerful', 'ZeusDB is fast']
 Deleted first doc: True
 ```
 <br />
 ## Monitoring and Observability
 ### Performance Monitoring
 ```python
 # Get index statistics
 stats = vector_store.get_zeusdb_stats()
 print(f"Index size: {stats.get('total_vectors', '0')} vectors")
 print(f"Dimension: {stats.get('dimension')} | Space: {stats.get('space')} | Index type: {stats.get('index_type')}")
 # Benchmark search performance
 performance = vector_store.benchmark_search_performance(
    query_count=100,
    max_threads=4
 )
 print(f"Search QPS: {performance.get('parallel_qps', 0):.0f}")
 # Check quantization status
 if vector_store.is_quantized():
    progress = vector_store.get_training_progress()
    print(f"Quantization training: {progress:.1f}% complete")
 else:
    print("Index is not quantized")
 ```
 **Expected results:**
 ```
 Index size: 2 vectors
 Dimension: 1536 | Space: cosine | Index type: HNSW
 Search QPS: 53807
 Index is not quantized
 ```
 ### Enterprise Logging
 ZeusDB includes enterprise-grade structured logging that works automatically with smart environment detection:
 ```python
 import logging
 # ZeusDB automatically detects your environment and applies appropriate logging:
 # - Development: Human-readable logs, WARNING level
 # - Production: JSON structured logs, ERROR level
 # - Testing: Minimal output, CRITICAL level
 # - Jupyter: Clean readable logs, INFO level
 # Operations are automatically logged with performance metrics
 vector_store.add_documents(docs)
 # Logs: {"operation":"vector_addition","total_inserted":2,"duration_ms":45}
 # Control logging with environment variables if needed
 # ZEUSDB_LOG_LEVEL=debug ZEUSDB_LOG_FORMAT=json python your_app.py
 ```
 To learn more about the full features of ZeusDB's enterprise logging capabilities please read the following [documentation](https://docs.zeusdb.com/en/latest/vector_database/logging.html).
 <br />
 ## Configuration Options
 ### Index Parameters
 ```python
 vdb = VectorDatabase()
 index = vdb.create(
    index_type="hnsw",           # Index algorithm
    dim=1536,                    # Vector dimension
    space="cosine",              # Distance metric: cosine, l2, l1
    m=16,                        # HNSW connectivity
    ef_construction=200,         # Build-time search width
    expected_size=100000,        # Expected number of vectors
    quantization_config=None     # Optional quantization
 )
 ```
 ### Search Parameters
 ```python
 results = vector_store.similarity_search(
    query="search query",
    k=5,                         # Number of results
    ef_search=None,              # Runtime search width (auto if None)
    filter={"key": "value"}      # Metadata filter
 )
 ```
 ## Error Handling
 The integration includes comprehensive error handling:
 ```python
 try:
    results = vector_store.similarity_search("query")
    print(results)
 except Exception as e:
    # Graceful degradation with logging
    print(f"Search failed: {e}")
    # Fallback logic here
 ```
 ## Requirements
 - **Python**: 3.10 or higher
 - **ZeusDB**: 0.0.8 or higher
 - **LangChain Core**: 0.3.74 or higher
 ## Installation from Source
 ```bash
 git clone https://github.com/zeusdb/langchain-zeusdb.git
 cd langchain-zeusdb/libs/zeusdb
 pip install -e .
 ```
 ## Use Cases
 - **RAG Applications**: High-performance retrieval for question answering
 - **Semantic Search**: Fast similarity search across large document collections
 - **Recommendation Systems**: Vector-based content and collaborative filtering
 - **Embeddings Analytics**: Analysis of high-dimensional embedding spaces
 - **Real-time Applications**: Low-latency vector search for production systems
 ## Compatibility
 ### LangChain Versions
 - **LangChain Core**: 0.3.74+
 ### Distance Metrics
 - **Cosine**: Default, normalized similarity
 - **Euclidean (L2)**: Geometric distance
 - **Manhattan (L1)**: City-block distance
 ### Embedding Models
 Compatible with any embedding provider:
 - OpenAI (`text-embedding-3-small`, `text-embedding-3-large`)
 - Hugging Face Transformers
 - Cohere Embeddings
 - Custom embedding functions
 ## Support
 - **Documentation**: [docs.zeusdb.com](https://docs.zeusdb.com)
 - **Issues**: [GitHub Issues](https://github.com/zeusdb/langchain-zeusdb/issues)
 - **Email**: contact@zeusdb.com
 ---
 *Making vector search fast, scalable, and developer-friendly.*
--- a/docs/docs/integrations/vectorstores/zeusdb.ipynb
+++ b/docs/docs/integrations/vectorstores/zeusdb.ipynb
@@ -0,0 +1,617 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ef1f0986",
   "metadata": {},
   "source": [
    "# ⚡ ZeusDB Vector Store\n",
    "\n",
    "ZeusDB is a high-performance, Rust-powered vector database with enterprise features like quantization, persistence and logging.\n",
    "\n",
    "This notebook covers how to get started with the ZeusDB Vector Store to efficiently use ZeusDB with LangChain."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "107c485d-13a3-4309-9fda-5a0440862d3c",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36fdc060",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d978e3fd-d130-436f-841d-d133c0fae8fb",
   "metadata": {},
   "source": [
    "Install the ZeusDB LangChain integration package from PyPi:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42ca8320-b866-4f37-944e-96eda54231d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "pip install -qU langchain-zeusdb"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a0e518a-ae8a-464b-8b47-9deb9d4ab063",
   "metadata": {},
   "source": [
    "*Setup in Jupyter Notebooks*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d092ea6-8553-4686-9563-b8318225a04a",
   "metadata": {},
   "source": [
    "> 💡 Tip: If you’re working inside Jupyter or Google Colab, use the %pip magic command so the package is installed into the active kernel:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64e28aa6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -qU langchain-zeusdb"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c12fe175-a299-47d3-869f-9367b6aa572d",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31554e69-40b2-4201-9f92-57e73ac66d33",
   "metadata": {},
   "source": [
    "## Getting Started"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b696b3dd-0fed-4ed2-a79a-5b32598508c0",
   "metadata": {},
   "source": [
    "This example uses OpenAIEmbeddings, which requires an OpenAI API key – [Get your OpenAI API key here](https://platform.openai.com/api-keys)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b79766e-7725-4be0-a183-4947b56892c5",
   "metadata": {},
   "source": [
    "If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5266cc7-28da-459e-a28d-128382ed5a20",
   "metadata": {},
   "source": [
    "Install the LangChain OpenAI integration package from PyPi:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ed941cd-5e06-4c61-9235-90bd0b0b0452",
   "metadata": {},
   "outputs": [],
   "source": [
    "pip install -qU langchain-openai\n",
    "\n",
    "# Use this command if inside Jupyter Notebooks\n",
    "#%pip install -qU langchain-openai"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f49b2ec-d047-455d-8c05-da041112dd8a",
   "metadata": {},
   "source": [
    "#### Please choose an option below for your OpenAI key integration"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed2d9bf6-be53-4fc1-9611-158f03fd71b7",
   "metadata": {},
   "source": [
    "*Option 1: 🔑 Enter your API key each time*  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eff5b6a5-4c57-4531-896e-54bcb2b1dec2",
   "metadata": {},
   "source": [
    "Use getpass in Jupyter to securely input your key for the current session:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08a50da9-5ed1-40dc-a390-07b031369761",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7321917e-8586-42e4-9822-b68cfd74f233",
   "metadata": {},
   "source": [
    "*Option 2: 🗂️ Use a .env file*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9297b6b-bd7e-457f-95af-5b41c7ab9b41",
   "metadata": {},
   "source": [
    "Keep your key in a local .env file and load it automatically with python-dotenv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85a139dc-f439-4e4e-bc46-76d9478c304d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()  # reads .env and sets OPENAI_API_KEY"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1af364e3-df59-4963-aaaa-0e83f6ec5e32",
   "metadata": {},
   "source": [
    "🎉🎉 That's it! You are good to go."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3146180e-026e-4421-a490-ffd14ceabac3",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93df377e",
   "metadata": {},
   "source": [
    "## Initialization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb55dfe8-2c98-45b6-ba90-7a3667ceee0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required Packages and Classes\n",
    "from langchain_zeusdb import ZeusDBVectorStore\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "from zeusdb import VectorDatabase"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Initialize embeddings\n",
    "embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
    "\n",
    "# Create ZeusDB index\n",
    "vdb = VectorDatabase()\n",
    "index = vdb.create(index_type=\"hnsw\", dim=1536, space=\"cosine\")\n",
    "\n",
    "# Create vector store\n",
    "vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f45fa43c-8b54-4a75-b7b0-92ac0ac506c6",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac6071d4",
   "metadata": {},
   "source": [
    "## Manage vector store"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edf53787-ebda-4306-afc3-f7d440dcb1ff",
   "metadata": {},
   "source": [
    "### 2.1 Add items to vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17f5efc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.documents import Document\n",
    "\n",
    "document_1 = Document(\n",
    "    page_content=\"ZeusDB is a high-performance vector database\",\n",
    "    metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
    ")\n",
    "\n",
    "document_2 = Document(\n",
    "    page_content=\"Product Quantization reduces memory usage significantly\",\n",
    "    metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
    ")\n",
    "\n",
    "document_3 = Document(\n",
    "    page_content=\"ZeusDB integrates seamlessly with LangChain\",\n",
    "    metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
    ")\n",
    "\n",
    "documents = [document_1, document_2, document_3]\n",
    "\n",
    "vector_store.add_documents(documents=documents, ids=[\"1\", \"2\", \"3\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c738c3e0",
   "metadata": {},
   "source": [
    "### 2.2 Update items in vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0aa8b71",
   "metadata": {},
   "outputs": [],
   "source": [
    "updated_document = Document(\n",
    "    page_content=\"ZeusDB now supports advanced Product Quantization with 4x-256x compression\",\n",
    "    metadata={\"source\": \"https://docs.zeusdb.com\", \"updated\": True},\n",
    ")\n",
    "\n",
    "vector_store.add_documents([updated_document], ids=[\"1\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcf1b905",
   "metadata": {},
   "source": [
    "### 2.3 Delete items from vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef61e188",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store.delete(ids=[\"3\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a0091af-777d-4651-888a-3b346d7990f5",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3620501",
   "metadata": {},
   "source": [
    "## Query vector store"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ba3fdb2-b7d6-4f0f-b8c9-91f63596018b",
   "metadata": {},
   "source": [
    "### 3.1 Query directly"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "400a9b25-9587-4116-ab59-6888602ec2b1",
   "metadata": {},
   "source": [
    "Performing a simple similarity search:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa0a16fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = vector_store.similarity_search(query=\"high performance database\", k=2)\n",
    "\n",
    "for doc in results:\n",
    "    print(f\"* {doc.page_content} [{doc.metadata}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ed9d733",
   "metadata": {},
   "source": [
    "If you want to execute a similarity search and receive the corresponding scores:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5efd2eaa",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = vector_store.similarity_search_with_score(query=\"memory optimization\", k=2)\n",
    "\n",
    "for doc, score in results:\n",
    "    print(f\"* [SIM={score:.3f}] {doc.page_content} [{doc.metadata}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c235cdc",
   "metadata": {},
   "source": [
    "### 3.2 Query by turning into retriever"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59292cb5-5dc8-4158-9137-89d0f6ca711d",
   "metadata": {},
   "source": [
    "You can also transform the vector store into a retriever for easier usage in your chains:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3460093",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 2})\n",
    "\n",
    "retriever.invoke(\"vector database features\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc2d2b63-99d8-45c4-85e6-6a9409551ada",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "persistence_section",
   "metadata": {},
   "source": [
    "## ZeusDB-Specific Features"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "memory_section",
   "metadata": {},
   "source": [
    "### 4.1 Memory-Efficient Setup with Product Quantization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12832d02-d9ea-4c35-a20f-05c85d1d7723",
   "metadata": {},
   "source": [
    "For large datasets, use Product Quantization to reduce memory usage:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "quantization_example",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create memory-optimized vector store\n",
    "quantization_config = {\"type\": \"pq\", \"subvectors\": 8, \"bits\": 8, \"training_size\": 10000}\n",
    "\n",
    "vdb_quantized = VectorDatabase()\n",
    "quantized_index = vdb_quantized.create(\n",
    "    index_type=\"hnsw\", dim=1536, quantization_config=quantization_config\n",
    ")\n",
    "\n",
    "quantized_vector_store = ZeusDBVectorStore(\n",
    "    zeusdb_index=quantized_index, embedding=embeddings\n",
    ")\n",
    "\n",
    "print(f\"Created quantized store: {quantized_index.info()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ffe0613-b2a7-484e-9219-1166b65c49c5",
   "metadata": {},
   "source": [
    "### 4.2 Persistence"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbc323ee-4c6c-43fc-beba-675d820ca078",
   "metadata": {},
   "source": [
    "Save and load your vector store to disk:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "834354d1-55ad-48fe-84e1-a5eacff3f6bb",
   "metadata": {},
   "source": [
    "How to Save your vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9d1332b-a7ac-4a4b-a060-f2061599d3f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save the vector store\n",
    "vector_store.save_index(\"my_zeusdb_index.zdb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23370621-5b51-4313-800f-3a2fb9de52d2",
   "metadata": {},
   "source": [
    "How to Load your vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9ed5778-58e4-4724-b69d-3c7b48cda429",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the vector store\n",
    "loaded_store = ZeusDBVectorStore.load_index(\n",
    "    path=\"my_zeusdb_index.zdb\", embedding=embeddings\n",
    ")\n",
    "\n",
    "print(f\"Loaded store with {loaded_store.get_vector_count()} vectors\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "610cfe63-d4a8-4ef0-88a8-cf9cc3cbbfce",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "901c75dc",
   "metadata": {},
   "source": [
    "## Usage for retrieval-augmented generation\n",
    "\n",
    "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
    "\n",
    "- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
    "- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d9d9d51-3798-410f-b1b3-f9736ea8c238",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25b08eb0-99ab-4919-a201-5243fdfa39e9",
   "metadata": {},
   "source": [
    "## API reference"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77fdca8b-f75e-4100-9f1d-7a017567dc59",
   "metadata": {},
   "source": [
    "For detailed documentation of all ZeusDBVectorStore features and configurations head to the Doc reference: https://docs.zeusdb.com/en/latest/vector_database/integrations/langchain.html"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/libs/packages.yml
+++ b/libs/packages.yml
@@ -735,6 +735,10 @@ packages:
 - name: langchain-google-bigtable
  name_title: Bigtable
  repo: googleapis/langchain-google-bigtable-python
 - name: langchain-oci
  name_title: Oracle Cloud Infrastructure (OCI)
  repo: oracle/langchain-oracle
  path: .
 - name: langchain-timbr
  provider_page: timbr
  path: .
@@ -742,7 +746,6 @@ packages:
 - name: langchain-zenrows
  path: .
  repo: ZenRows-Hub/langchain-zenrows
- name: langchain-oci
+- name: langchain-zeusdb
-  name_title: Oracle Cloud Infrastructure (OCI)
+  repo: zeusdb/langchain-zeusdb
-  repo: oracle/langchain-oracle
+  path: libs/zeusdb
  path: .