diff --git a/docs/docs/integrations/providers/zeusdb.mdx b/docs/docs/integrations/providers/zeusdb.mdx new file mode 100644 index 00000000000..597c4d4f57c --- /dev/null +++ b/docs/docs/integrations/providers/zeusdb.mdx @@ -0,0 +1,603 @@ +

+

LangChain ZeusDB Integration

+

+ +A high-performance LangChain integration for ZeusDB, bringing enterprise-grade vector search capabilities to your LangChain applications. + +## Features + +๐Ÿš€ **High Performance** +- Rust-powered vector database backend +- Advanced HNSW indexing for sub-millisecond search +- Product Quantization for 4x-256x memory compression +- Concurrent search with automatic parallelization + +๐ŸŽฏ **LangChain Native** +- Full VectorStore API compliance +- Async/await support for all operations +- Seamless integration with LangChain retrievers +- Maximal Marginal Relevance (MMR) search + +๐Ÿข **Enterprise Ready** +- Structured logging with performance monitoring +- Index persistence with complete state preservation +- Advanced metadata filtering +- Graceful error handling and fallback mechanisms + +## Quick Start + +### Installation + +```bash +pip install -qU langchain-zeusdb +``` + +### Getting Started + +This example uses *OpenAIEmbeddings*, which requires an OpenAI API key - [Get your OpenAI API key here](https://platform.openai.com/api-keys) + +If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.). +```bash +pip install langchain-openai +``` + +```python +import os +import getpass + +os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:') +``` + +### Basic Usage + +```python +from langchain_zeusdb import ZeusDBVectorStore +from langchain_openai import OpenAIEmbeddings +from zeusdb import VectorDatabase + +# Initialize embeddings +embeddings = OpenAIEmbeddings(model="text-embedding-3-small") + +# Create ZeusDB index +vdb = VectorDatabase() +index = vdb.create( + index_type="hnsw", + dim=1536, + space="cosine" +) + +# Create vector store +vector_store = ZeusDBVectorStore( + zeusdb_index=index, + embedding=embeddings +) + +# Add documents +from langchain_core.documents import Document + +docs = [ + Document(page_content="ZeusDB is fast", metadata={"source": "docs"}), + Document(page_content="LangChain is powerful", metadata={"source": "docs"}), +] + +vector_store.add_documents(docs) + +# Search +results = vector_store.similarity_search("fast database", k=2) +print(f"Found the following {len(results)} results:") +print(results) +``` + +**Expected results:** +``` +Found the following 2 results: +[Document(id='ea2b4f13-b0b7-4cef-bb91-0fc4f4c41295', metadata={'source': 'docs'}, page_content='ZeusDB is fast'), Document(id='33dc1e87-a18a-4827-a0df-6ee47eabc7b2', metadata={'source': 'docs'}, page_content='LangChain is powerful')] +``` + +
+ +### Factory Methods + +For convenience, you can create and populate a vector store in a single step: + +**Example 1: - Create from texts (creates index and adds texts in one step)** +```python +vector_store_texts = ZeusDBVectorStore.from_texts( + texts=["Hello world", "Goodbye world"], + embedding=embeddings, + metadatas=[{"source": "text1"}, {"source": "text2"}] +) + +print("texts store count:", vector_store_texts.get_vector_count()) # -> 2 +print("texts store peek:", vector_store_texts.zeusdb_index.list(2)) # [('id1', {...}), ('id2', {...})] + +# Search the texts-based store +results = vector_store_texts.similarity_search("Hello", k=1) +print(f"Found in texts store: {results[0].page_content}") # -> "Hello world" +``` + +**Expected results:** +``` +texts store count: 2 +texts store peek: [('e9c39b44-b610-4e00-91f3-bf652e9989ac', {'source': 'text1', 'text': 'Hello world'}), ('d33f210c-ed53-4006-a64a-a9eee397fec9', {'source': 'text2', 'text': 'Goodbye world'})] +Found in texts store: Hello world +``` + +
+ +**Example 2: - Create from documents (creates index and adds documents in one step)** +```python +new_docs = [ + Document(page_content="Python is great", metadata={"source": "python"}), + Document(page_content="JavaScript is flexible", metadata={"source": "js"}), +] + +vector_store_docs = ZeusDBVectorStore.from_documents( + documents=new_docs, + embedding=embeddings +) + +print("docs store count:", vector_store_docs.get_vector_count()) # -> 2 +print("docs store peek:", vector_store_docs.zeusdb_index.list(2)) # [('id3', {...}), ('id4', {...})] + +# Search the documents-based store +results = vector_store_docs.similarity_search("Python", k=1) +print(f"Found in docs store: {results[0].page_content}") # -> "Python is great" +``` + +**Expected results:** +``` +docs store count: 2 +docs store peek: [('aab2d1c1-7e02-4817-8dd8-6fb03570bb6f', {'text': 'Python is great', 'source': 'python'}), ('9a8a82cb-0e70-456c-9db2-556e464de14e', {'text': 'JavaScript is flexible', 'source': 'js'})] +Found in docs store: Python is great +``` + +
+ +## Advanced Features + +ZeusDB's enterprise-grade capabilities are fully integrated into the LangChain ecosystem, providing quantization, persistence, advanced search features and many other enterprise capabilities. + +### Memory-Efficient Setup with Quantization + +For large datasets, use Product Quantization to reduce memory usage: + +```python +# Create quantized index for memory efficiency +quantization_config = { + 'type': 'pq', + 'subvectors': 8, + 'bits': 8, + 'training_size': 10000 +} + +vdb = VectorDatabase() +index = vdb.create( + index_type="hnsw", + dim=1536, + space="cosine", + quantization_config=quantization_config +) + +vector_store = ZeusDBVectorStore( + zeusdb_index=index, + embedding=embeddings +) +``` + +Please refer to our [documentation](https://docs.zeusdb.com/en/latest/vector_database/product_quantization.html) for helpful configuration guidelines and recommendations for setting up quantization. + +
+ +### Persistence + +ZeusDB persistence lets you save a fully populated index to disk and load it later with complete state restoration. This includes vectors, metadata, HNSW graph, and (if enabled) Product Quantization models. + +What gets saved: + - Vectors & IDs + - Metadata + - HNSW graph structure + - Quantization config, centroids, and training state (if PQ is enabled) + + +**How to Save your vector store** +```python +# Save index +vector_store.save_index("my_index.zdb") +``` + +**How to Load your vector store** +```python +# Load index +loaded_store = ZeusDBVectorStore.load_index( + path="my_index.zdb", + embedding=embeddings +) + +# Verify after load +print("vector count:", loaded_store.get_vector_count()) +print("index info:", loaded_store.info()) +print("store peek:", loaded_store.zeusdb_index.list(2)) +``` + +**Notes** + - The path is a directory, not a single file. Ensure the target is writable. + - Saved indexes are cross-platform and include format/version info for compatibility checks. + - If you used PQ, both the compression model and state are preservedโ€”no need to retrain after loading. + - You can continue to use all vector store APIs (similarity_search, retrievers, etc.) on the loaded_store. + +For further details (including file structure, and further comprehensive examples), see the [documentation](https://docs.zeusdb.com/en/latest/vector_database/persistence.html). + +
+ +### Advanced Search Options + +Use these to control scoring, diversity, metadata filtering, and retriever integration for your searches. + +#### Similarity search with scores + +Returns `(Document, raw_distance)` pairs from ZeusDB โ€” **lower distance = more similar**. +If you prefer normalized relevance in `[0, 1]`, use `similarity_search_with_relevance_scores`. + +```python +# Similarity search with scores +results_with_scores = vector_store.similarity_search_with_score( + query="machine learning", + k=5 +) + +print(results_with_scores) +``` + +**Expected results:** +``` +[ + (Document(id='ac0eaf5b-9f02-4ce2-8957-c369a7262c61', metadata={'source': 'docs'}, page_content='LangChain is powerful'), 0.8218843340873718), + (Document(id='faae3adf-7cf3-463c-b282-3790b096fa23', metadata={'source': 'docs'}, page_content='ZeusDB is fast'), 0.9140053391456604) +] +``` + +#### MMR search for diversity + +MMR (Maximal Marginal Relevance) balances two forces: relevance to the query and diversity among selected results, reducing near-duplicate answers. Control the trade-off with lambda_mult (1.0 = all relevance, 0.0 = all diversity). + +```python +# MMR search for diversity +mmr_results = vector_store.max_marginal_relevance_search( + query="AI applications", + k=5, + fetch_k=20, + lambda_mult=0.7 # Balance relevance vs diversity +) + +print(mmr_results) +``` + +#### Search with metadata filtering + +Filter results using document metadata you stored when adding docs + +```python +# Search with metadata filtering +results = vector_store.similarity_search( + query="database performance", + k=3, + filter={"source": "documentation"} +) +``` + +For supported metadata query types and operators, please refer to the [documentation](https://docs.zeusdb.com/en/latest/vector_database/metadata_filtering.html). + +#### As a Retriever + +Turning the vector store into a retriever gives you a standard LangChain interface that chains (e.g., RetrievalQA) can call to fetch context. Under the hood it uses your chosen search type (similarity or mmr) and search_kwargs. + +```python +# Convert to retriever for use in chains +retriever = vector_store.as_retriever( + search_type="mmr", + search_kwargs={"k": 3, "lambda_mult": 0.8} +) + +# Use with LangChain Expression Language (LCEL) - requires only langchain-core +from langchain_core.prompts import ChatPromptTemplate +from langchain_core.output_parsers import StrOutputParser +from langchain_core.runnables import RunnablePassthrough +from langchain_openai import ChatOpenAI + +def format_docs(docs): + return "\n\n".join([d.page_content for d in docs]) + +template = """Answer the question based only on the following context: +{context} + +Question: {question} +""" + +prompt = ChatPromptTemplate.from_template(template) +llm = ChatOpenAI() + +# Create a chain using LCEL +chain = ( + {"context": retriever | format_docs, "question": RunnablePassthrough()} + | prompt + | llm + | StrOutputParser() +) + +# Use the chain +answer = chain.invoke("What is ZeusDB?") +print(answer) +``` + +**Expected results:** +``` +ZeusDB is a fast database management system. +``` + +
+ +## Async Support + +ZeusDB supports asynchronous operations for non-blocking, concurrent vector operations. + +**When to use async:** web servers (FastAPI/Starlette), agents/pipelines doing parallel searches, or notebooks where you want non-blocking/concurrent retrieval. If you're writing simple scripts, the sync methods are fine. + +Those are **asynchronous operations** - the async/await versions of the regular synchronous methods. Here's what each one does: + +1. `await vector_store.aadd_documents(documents)` - Asynchronously adds documents to the vector store (async version of `add_documents()`) +2. `await vector_store.asimilarity_search("query", k=5)` - Asynchronously performs similarity search (async version of `similarity_search()`) +3. `await vector_store.adelete(ids=["doc1", "doc2"])` - Asynchronously deletes documents by their IDs (async version of `delete()`) + +The async versions are useful when: +- You're building async applications (using `asyncio`, FastAPI, etc.) +- You want non-blocking operations that can run concurrently +- You're handling multiple requests simultaneously +- You want better performance in I/O-bound applications + +For example, instead of blocking while adding documents: + +```python +# Synchronous (blocking) +vector_store.add_documents(docs) # Blocks until complete + +# Asynchronous (non-blocking) +await vector_store.aadd_documents(docs) # Can do other work while this runs +``` + +All operations support async/await: + +**Script version (`python my_script.py`):** +```python +import asyncio +from langchain_zeusdb import ZeusDBVectorStore +from langchain_openai import OpenAIEmbeddings +from langchain_core.documents import Document +from zeusdb import VectorDatabase + +# Setup +embeddings = OpenAIEmbeddings(model="text-embedding-3-small") +vdb = VectorDatabase() +index = vdb.create(index_type="hnsw", dim=1536, space="cosine") +vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings) + +docs = [ + Document(page_content="ZeusDB is fast", metadata={"source": "docs"}), + Document(page_content="LangChain is powerful", metadata={"source": "docs"}), +] + +async def main(): + # Add documents asynchronously + ids = await vector_store.aadd_documents(docs) + print("Added IDs:", ids) + + # Run multiple searches concurrently + results_fast, results_powerful = await asyncio.gather( + vector_store.asimilarity_search("fast", k=2), + vector_store.asimilarity_search("powerful", k=2), + ) + print("Fast results:", [d.page_content for d in results_fast]) + print("Powerful results:", [d.page_content for d in results_powerful]) + + # Delete documents asynchronously + deleted = await vector_store.adelete(ids=ids[:1]) + print("Deleted first doc:", deleted) + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**Colab/Notebook/Jupyter version (top-level `await`):** +```python +from langchain_zeusdb import ZeusDBVectorStore +from langchain_openai import OpenAIEmbeddings +from langchain_core.documents import Document +from zeusdb import VectorDatabase +import asyncio + +# Setup +embeddings = OpenAIEmbeddings(model="text-embedding-3-small") +vdb = VectorDatabase() +index = vdb.create(index_type="hnsw", dim=1536, space="cosine") +vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings) + +docs = [ + Document(page_content="ZeusDB is fast", metadata={"source": "docs"}), + Document(page_content="LangChain is powerful", metadata={"source": "docs"}), +] + +# Add documents asynchronously +ids = await vector_store.aadd_documents(docs) +print("Added IDs:", ids) + +# Run multiple searches concurrently +results_fast, results_powerful = await asyncio.gather( + vector_store.asimilarity_search("fast", k=2), + vector_store.asimilarity_search("powerful", k=2), +) +print("Fast results:", [d.page_content for d in results_fast]) +print("Powerful results:", [d.page_content for d in results_powerful]) + +# Delete documents asynchronously +deleted = await vector_store.adelete(ids=ids[:1]) +print("Deleted first doc:", deleted) +``` + +**Expected results:** +``` +Added IDs: ['9c440918-715f-49ba-9b97-0d991d29e997', 'ad59c645-d3ba-4a4a-a016-49ed39514123'] +Fast results: ['ZeusDB is fast', 'LangChain is powerful'] +Powerful results: ['LangChain is powerful', 'ZeusDB is fast'] +Deleted first doc: True +``` + +
+ +## Monitoring and Observability + +### Performance Monitoring + +```python +# Get index statistics +stats = vector_store.get_zeusdb_stats() +print(f"Index size: {stats.get('total_vectors', '0')} vectors") +print(f"Dimension: {stats.get('dimension')} | Space: {stats.get('space')} | Index type: {stats.get('index_type')}") + +# Benchmark search performance +performance = vector_store.benchmark_search_performance( + query_count=100, + max_threads=4 +) +print(f"Search QPS: {performance.get('parallel_qps', 0):.0f}") + +# Check quantization status +if vector_store.is_quantized(): + progress = vector_store.get_training_progress() + print(f"Quantization training: {progress:.1f}% complete") +else: + print("Index is not quantized") +``` + +**Expected results:** +``` +Index size: 2 vectors +Dimension: 1536 | Space: cosine | Index type: HNSW +Search QPS: 53807 +Index is not quantized +``` + +### Enterprise Logging + +ZeusDB includes enterprise-grade structured logging that works automatically with smart environment detection: + +```python +import logging + +# ZeusDB automatically detects your environment and applies appropriate logging: +# - Development: Human-readable logs, WARNING level +# - Production: JSON structured logs, ERROR level +# - Testing: Minimal output, CRITICAL level +# - Jupyter: Clean readable logs, INFO level + +# Operations are automatically logged with performance metrics +vector_store.add_documents(docs) +# Logs: {"operation":"vector_addition","total_inserted":2,"duration_ms":45} + +# Control logging with environment variables if needed +# ZEUSDB_LOG_LEVEL=debug ZEUSDB_LOG_FORMAT=json python your_app.py +``` + +To learn more about the full features of ZeusDB's enterprise logging capabilities please read the following [documentation](https://docs.zeusdb.com/en/latest/vector_database/logging.html). + +
+ +## Configuration Options + +### Index Parameters + +```python +vdb = VectorDatabase() +index = vdb.create( + index_type="hnsw", # Index algorithm + dim=1536, # Vector dimension + space="cosine", # Distance metric: cosine, l2, l1 + m=16, # HNSW connectivity + ef_construction=200, # Build-time search width + expected_size=100000, # Expected number of vectors + quantization_config=None # Optional quantization +) +``` + +### Search Parameters + +```python +results = vector_store.similarity_search( + query="search query", + k=5, # Number of results + ef_search=None, # Runtime search width (auto if None) + filter={"key": "value"} # Metadata filter +) +``` + +## Error Handling + +The integration includes comprehensive error handling: + +```python +try: + results = vector_store.similarity_search("query") + print(results) +except Exception as e: + # Graceful degradation with logging + print(f"Search failed: {e}") + # Fallback logic here +``` + +## Requirements + +- **Python**: 3.10 or higher +- **ZeusDB**: 0.0.8 or higher +- **LangChain Core**: 0.3.74 or higher + +## Installation from Source + +```bash +git clone https://github.com/zeusdb/langchain-zeusdb.git +cd langchain-zeusdb/libs/zeusdb +pip install -e . +``` + +## Use Cases + +- **RAG Applications**: High-performance retrieval for question answering +- **Semantic Search**: Fast similarity search across large document collections +- **Recommendation Systems**: Vector-based content and collaborative filtering +- **Embeddings Analytics**: Analysis of high-dimensional embedding spaces +- **Real-time Applications**: Low-latency vector search for production systems + +## Compatibility + +### LangChain Versions +- **LangChain Core**: 0.3.74+ + +### Distance Metrics +- **Cosine**: Default, normalized similarity +- **Euclidean (L2)**: Geometric distance +- **Manhattan (L1)**: City-block distance + +### Embedding Models +Compatible with any embedding provider: +- OpenAI (`text-embedding-3-small`, `text-embedding-3-large`) +- Hugging Face Transformers +- Cohere Embeddings +- Custom embedding functions + +## Support + +- **Documentation**: [docs.zeusdb.com](https://docs.zeusdb.com) +- **Issues**: [GitHub Issues](https://github.com/zeusdb/langchain-zeusdb/issues) +- **Email**: contact@zeusdb.com + +--- + +*Making vector search fast, scalable, and developer-friendly.* diff --git a/docs/docs/integrations/vectorstores/zeusdb.ipynb b/docs/docs/integrations/vectorstores/zeusdb.ipynb new file mode 100644 index 00000000000..19c12bb8e23 --- /dev/null +++ b/docs/docs/integrations/vectorstores/zeusdb.ipynb @@ -0,0 +1,617 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "ef1f0986", + "metadata": {}, + "source": [ + "# โšก ZeusDB Vector Store\n", + "\n", + "ZeusDB is a high-performance, Rust-powered vector database with enterprise features like quantization, persistence and logging.\n", + "\n", + "This notebook covers how to get started with the ZeusDB Vector Store to efficiently use ZeusDB with LangChain." + ] + }, + { + "cell_type": "markdown", + "id": "107c485d-13a3-4309-9fda-5a0440862d3c", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "36fdc060", + "metadata": {}, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "markdown", + "id": "d978e3fd-d130-436f-841d-d133c0fae8fb", + "metadata": {}, + "source": [ + "Install the ZeusDB LangChain integration package from PyPi:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42ca8320-b866-4f37-944e-96eda54231d2", + "metadata": {}, + "outputs": [], + "source": [ + "pip install -qU langchain-zeusdb" + ] + }, + { + "cell_type": "markdown", + "id": "2a0e518a-ae8a-464b-8b47-9deb9d4ab063", + "metadata": {}, + "source": [ + "*Setup in Jupyter Notebooks*" + ] + }, + { + "cell_type": "markdown", + "id": "1d092ea6-8553-4686-9563-b8318225a04a", + "metadata": {}, + "source": [ + "> ๐Ÿ’ก Tip: If youโ€™re working inside Jupyter or Google Colab, use the %pip magic command so the package is installed into the active kernel:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "64e28aa6", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -qU langchain-zeusdb" + ] + }, + { + "cell_type": "markdown", + "id": "c12fe175-a299-47d3-869f-9367b6aa572d", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "31554e69-40b2-4201-9f92-57e73ac66d33", + "metadata": {}, + "source": [ + "## Getting Started" + ] + }, + { + "cell_type": "markdown", + "id": "b696b3dd-0fed-4ed2-a79a-5b32598508c0", + "metadata": {}, + "source": [ + "This example uses OpenAIEmbeddings, which requires an OpenAI API key โ€“ [Get your OpenAI API key here](https://platform.openai.com/api-keys)" + ] + }, + { + "cell_type": "markdown", + "id": "2b79766e-7725-4be0-a183-4947b56892c5", + "metadata": {}, + "source": [ + "If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.)." + ] + }, + { + "cell_type": "markdown", + "id": "b5266cc7-28da-459e-a28d-128382ed5a20", + "metadata": {}, + "source": [ + "Install the LangChain OpenAI integration package from PyPi:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1ed941cd-5e06-4c61-9235-90bd0b0b0452", + "metadata": {}, + "outputs": [], + "source": [ + "pip install -qU langchain-openai\n", + "\n", + "# Use this command if inside Jupyter Notebooks\n", + "#%pip install -qU langchain-openai" + ] + }, + { + "cell_type": "markdown", + "id": "0f49b2ec-d047-455d-8c05-da041112dd8a", + "metadata": {}, + "source": [ + "#### Please choose an option below for your OpenAI key integration" + ] + }, + { + "cell_type": "markdown", + "id": "ed2d9bf6-be53-4fc1-9611-158f03fd71b7", + "metadata": {}, + "source": [ + "*Option 1: ๐Ÿ”‘ Enter your API key each time* " + ] + }, + { + "cell_type": "markdown", + "id": "eff5b6a5-4c57-4531-896e-54bcb2b1dec2", + "metadata": {}, + "source": [ + "Use getpass in Jupyter to securely input your key for the current session:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08a50da9-5ed1-40dc-a390-07b031369761", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import getpass\n", + "\n", + "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" + ] + }, + { + "cell_type": "markdown", + "id": "7321917e-8586-42e4-9822-b68cfd74f233", + "metadata": {}, + "source": [ + "*Option 2: ๐Ÿ—‚๏ธ Use a .env file*" + ] + }, + { + "cell_type": "markdown", + "id": "b9297b6b-bd7e-457f-95af-5b41c7ab9b41", + "metadata": {}, + "source": [ + "Keep your key in a local .env file and load it automatically with python-dotenv" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "85a139dc-f439-4e4e-bc46-76d9478c304d", + "metadata": {}, + "outputs": [], + "source": [ + "from dotenv import load_dotenv\n", + "\n", + "load_dotenv() # reads .env and sets OPENAI_API_KEY" + ] + }, + { + "cell_type": "markdown", + "id": "1af364e3-df59-4963-aaaa-0e83f6ec5e32", + "metadata": {}, + "source": [ + "๐ŸŽ‰๐ŸŽ‰ That's it! You are good to go." + ] + }, + { + "cell_type": "markdown", + "id": "3146180e-026e-4421-a490-ffd14ceabac3", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "93df377e", + "metadata": {}, + "source": [ + "## Initialization" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fb55dfe8-2c98-45b6-ba90-7a3667ceee0c", + "metadata": {}, + "outputs": [], + "source": [ + "# Import required Packages and Classes\n", + "from langchain_zeusdb import ZeusDBVectorStore\n", + "from langchain_openai import OpenAIEmbeddings\n", + "from zeusdb import VectorDatabase" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Initialize embeddings\n", + "embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n", + "\n", + "# Create ZeusDB index\n", + "vdb = VectorDatabase()\n", + "index = vdb.create(index_type=\"hnsw\", dim=1536, space=\"cosine\")\n", + "\n", + "# Create vector store\n", + "vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)" + ] + }, + { + "cell_type": "markdown", + "id": "f45fa43c-8b54-4a75-b7b0-92ac0ac506c6", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "ac6071d4", + "metadata": {}, + "source": [ + "## Manage vector store" + ] + }, + { + "cell_type": "markdown", + "id": "edf53787-ebda-4306-afc3-f7d440dcb1ff", + "metadata": {}, + "source": [ + "### 2.1 Add items to vector store" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17f5efc0", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.documents import Document\n", + "\n", + "document_1 = Document(\n", + " page_content=\"ZeusDB is a high-performance vector database\",\n", + " metadata={\"source\": \"https://docs.zeusdb.com\"},\n", + ")\n", + "\n", + "document_2 = Document(\n", + " page_content=\"Product Quantization reduces memory usage significantly\",\n", + " metadata={\"source\": \"https://docs.zeusdb.com\"},\n", + ")\n", + "\n", + "document_3 = Document(\n", + " page_content=\"ZeusDB integrates seamlessly with LangChain\",\n", + " metadata={\"source\": \"https://docs.zeusdb.com\"},\n", + ")\n", + "\n", + "documents = [document_1, document_2, document_3]\n", + "\n", + "vector_store.add_documents(documents=documents, ids=[\"1\", \"2\", \"3\"])" + ] + }, + { + "cell_type": "markdown", + "id": "c738c3e0", + "metadata": {}, + "source": [ + "### 2.2 Update items in vector store" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f0aa8b71", + "metadata": {}, + "outputs": [], + "source": [ + "updated_document = Document(\n", + " page_content=\"ZeusDB now supports advanced Product Quantization with 4x-256x compression\",\n", + " metadata={\"source\": \"https://docs.zeusdb.com\", \"updated\": True},\n", + ")\n", + "\n", + "vector_store.add_documents([updated_document], ids=[\"1\"])" + ] + }, + { + "cell_type": "markdown", + "id": "dcf1b905", + "metadata": {}, + "source": [ + "### 2.3 Delete items from vector store" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef61e188", + "metadata": {}, + "outputs": [], + "source": [ + "vector_store.delete(ids=[\"3\"])" + ] + }, + { + "cell_type": "markdown", + "id": "1a0091af-777d-4651-888a-3b346d7990f5", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "c3620501", + "metadata": {}, + "source": [ + "## Query vector store" + ] + }, + { + "cell_type": "markdown", + "id": "4ba3fdb2-b7d6-4f0f-b8c9-91f63596018b", + "metadata": {}, + "source": [ + "### 3.1 Query directly" + ] + }, + { + "cell_type": "markdown", + "id": "400a9b25-9587-4116-ab59-6888602ec2b1", + "metadata": {}, + "source": [ + "Performing a simple similarity search:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa0a16fa", + "metadata": {}, + "outputs": [], + "source": [ + "results = vector_store.similarity_search(query=\"high performance database\", k=2)\n", + "\n", + "for doc in results:\n", + " print(f\"* {doc.page_content} [{doc.metadata}]\")" + ] + }, + { + "cell_type": "markdown", + "id": "3ed9d733", + "metadata": {}, + "source": [ + "If you want to execute a similarity search and receive the corresponding scores:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5efd2eaa", + "metadata": {}, + "outputs": [], + "source": [ + "results = vector_store.similarity_search_with_score(query=\"memory optimization\", k=2)\n", + "\n", + "for doc, score in results:\n", + " print(f\"* [SIM={score:.3f}] {doc.page_content} [{doc.metadata}]\")" + ] + }, + { + "cell_type": "markdown", + "id": "0c235cdc", + "metadata": {}, + "source": [ + "### 3.2 Query by turning into retriever" + ] + }, + { + "cell_type": "markdown", + "id": "59292cb5-5dc8-4158-9137-89d0f6ca711d", + "metadata": {}, + "source": [ + "You can also transform the vector store into a retriever for easier usage in your chains:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f3460093", + "metadata": {}, + "outputs": [], + "source": [ + "retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 2})\n", + "\n", + "retriever.invoke(\"vector database features\")" + ] + }, + { + "cell_type": "markdown", + "id": "cc2d2b63-99d8-45c4-85e6-6a9409551ada", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "persistence_section", + "metadata": {}, + "source": [ + "## ZeusDB-Specific Features" + ] + }, + { + "cell_type": "markdown", + "id": "memory_section", + "metadata": {}, + "source": [ + "### 4.1 Memory-Efficient Setup with Product Quantization" + ] + }, + { + "cell_type": "markdown", + "id": "12832d02-d9ea-4c35-a20f-05c85d1d7723", + "metadata": {}, + "source": [ + "For large datasets, use Product Quantization to reduce memory usage:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "quantization_example", + "metadata": {}, + "outputs": [], + "source": [ + "# Create memory-optimized vector store\n", + "quantization_config = {\"type\": \"pq\", \"subvectors\": 8, \"bits\": 8, \"training_size\": 10000}\n", + "\n", + "vdb_quantized = VectorDatabase()\n", + "quantized_index = vdb_quantized.create(\n", + " index_type=\"hnsw\", dim=1536, quantization_config=quantization_config\n", + ")\n", + "\n", + "quantized_vector_store = ZeusDBVectorStore(\n", + " zeusdb_index=quantized_index, embedding=embeddings\n", + ")\n", + "\n", + "print(f\"Created quantized store: {quantized_index.info()}\")" + ] + }, + { + "cell_type": "markdown", + "id": "6ffe0613-b2a7-484e-9219-1166b65c49c5", + "metadata": {}, + "source": [ + "### 4.2 Persistence" + ] + }, + { + "cell_type": "markdown", + "id": "fbc323ee-4c6c-43fc-beba-675d820ca078", + "metadata": {}, + "source": [ + "Save and load your vector store to disk:" + ] + }, + { + "cell_type": "markdown", + "id": "834354d1-55ad-48fe-84e1-a5eacff3f6bb", + "metadata": {}, + "source": [ + "How to Save your vector store" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9d1332b-a7ac-4a4b-a060-f2061599d3f1", + "metadata": {}, + "outputs": [], + "source": [ + "# Save the vector store\n", + "vector_store.save_index(\"my_zeusdb_index.zdb\")" + ] + }, + { + "cell_type": "markdown", + "id": "23370621-5b51-4313-800f-3a2fb9de52d2", + "metadata": {}, + "source": [ + "How to Load your vector store" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9ed5778-58e4-4724-b69d-3c7b48cda429", + "metadata": {}, + "outputs": [], + "source": [ + "# Load the vector store\n", + "loaded_store = ZeusDBVectorStore.load_index(\n", + " path=\"my_zeusdb_index.zdb\", embedding=embeddings\n", + ")\n", + "\n", + "print(f\"Loaded store with {loaded_store.get_vector_count()} vectors\")" + ] + }, + { + "cell_type": "markdown", + "id": "610cfe63-d4a8-4ef0-88a8-cf9cc3cbbfce", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "901c75dc", + "metadata": {}, + "source": [ + "## Usage for retrieval-augmented generation\n", + "\n", + "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n", + "\n", + "- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n", + "- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)" + ] + }, + { + "cell_type": "markdown", + "id": "1d9d9d51-3798-410f-b1b3-f9736ea8c238", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "25b08eb0-99ab-4919-a201-5243fdfa39e9", + "metadata": {}, + "source": [ + "## API reference" + ] + }, + { + "cell_type": "markdown", + "id": "77fdca8b-f75e-4100-9f1d-7a017567dc59", + "metadata": {}, + "source": [ + "For detailed documentation of all ZeusDBVectorStore features and configurations head to the Doc reference: https://docs.zeusdb.com/en/latest/vector_database/integrations/langchain.html" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/packages.yml b/libs/packages.yml index 043a8767d18..1faf4cbf268 100644 --- a/libs/packages.yml +++ b/libs/packages.yml @@ -735,6 +735,10 @@ packages: - name: langchain-google-bigtable name_title: Bigtable repo: googleapis/langchain-google-bigtable-python +- name: langchain-oci + name_title: Oracle Cloud Infrastructure (OCI) + repo: oracle/langchain-oracle + path: . - name: langchain-timbr provider_page: timbr path: . @@ -742,7 +746,6 @@ packages: - name: langchain-zenrows path: . repo: ZenRows-Hub/langchain-zenrows -- name: langchain-oci - name_title: Oracle Cloud Infrastructure (OCI) - repo: oracle/langchain-oracle - path: . +- name: langchain-zeusdb + repo: zeusdb/langchain-zeusdb + path: libs/zeusdb