langchain/docs/docs/integrations/vectorstores/zeusdb.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ef1f0986",
   "metadata": {},
   "source": [
    "# ⚡ ZeusDB Vector Store\n",
    "\n",
    "ZeusDB is a high-performance, Rust-powered vector database with enterprise features like quantization, persistence and logging.\n",
    "\n",
    "This notebook covers how to get started with the ZeusDB Vector Store to efficiently use ZeusDB with LangChain."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "107c485d-13a3-4309-9fda-5a0440862d3c",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36fdc060",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d978e3fd-d130-436f-841d-d133c0fae8fb",
   "metadata": {},
   "source": [
    "Install the ZeusDB LangChain integration package from PyPi:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42ca8320-b866-4f37-944e-96eda54231d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "pip install -qU langchain-zeusdb"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a0e518a-ae8a-464b-8b47-9deb9d4ab063",
   "metadata": {},
   "source": [
    "*Setup in Jupyter Notebooks*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d092ea6-8553-4686-9563-b8318225a04a",
   "metadata": {},
   "source": [
    "> 💡 Tip: If you’re working inside Jupyter or Google Colab, use the %pip magic command so the package is installed into the active kernel:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64e28aa6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -qU langchain-zeusdb"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c12fe175-a299-47d3-869f-9367b6aa572d",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31554e69-40b2-4201-9f92-57e73ac66d33",
   "metadata": {},
   "source": [
    "## Getting Started"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b696b3dd-0fed-4ed2-a79a-5b32598508c0",
   "metadata": {},
   "source": [
    "This example uses OpenAIEmbeddings, which requires an OpenAI API key – [Get your OpenAI API key here](https://platform.openai.com/api-keys)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b79766e-7725-4be0-a183-4947b56892c5",
   "metadata": {},
   "source": [
    "If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5266cc7-28da-459e-a28d-128382ed5a20",
   "metadata": {},
   "source": [
    "Install the LangChain OpenAI integration package from PyPi:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ed941cd-5e06-4c61-9235-90bd0b0b0452",
   "metadata": {},
   "outputs": [],
   "source": [
    "pip install -qU langchain-openai\n",
    "\n",
    "# Use this command if inside Jupyter Notebooks\n",
    "#%pip install -qU langchain-openai"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f49b2ec-d047-455d-8c05-da041112dd8a",
   "metadata": {},
   "source": [
    "#### Please choose an option below for your OpenAI key integration"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed2d9bf6-be53-4fc1-9611-158f03fd71b7",
   "metadata": {},
   "source": [
    "*Option 1: 🔑 Enter your API key each time*  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eff5b6a5-4c57-4531-896e-54bcb2b1dec2",
   "metadata": {},
   "source": [
    "Use getpass in Jupyter to securely input your key for the current session:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08a50da9-5ed1-40dc-a390-07b031369761",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7321917e-8586-42e4-9822-b68cfd74f233",
   "metadata": {},
   "source": [
    "*Option 2: 🗂️ Use a .env file*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9297b6b-bd7e-457f-95af-5b41c7ab9b41",
   "metadata": {},
   "source": [
    "Keep your key in a local .env file and load it automatically with python-dotenv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85a139dc-f439-4e4e-bc46-76d9478c304d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()  # reads .env and sets OPENAI_API_KEY"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1af364e3-df59-4963-aaaa-0e83f6ec5e32",
   "metadata": {},
   "source": [
    "🎉🎉 That's it! You are good to go."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3146180e-026e-4421-a490-ffd14ceabac3",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93df377e",
   "metadata": {},
   "source": [
    "## Initialization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb55dfe8-2c98-45b6-ba90-7a3667ceee0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required Packages and Classes\n",
    "from langchain_zeusdb import ZeusDBVectorStore\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "from zeusdb import VectorDatabase"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Initialize embeddings\n",
    "embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
    "\n",
    "# Create ZeusDB index\n",
    "vdb = VectorDatabase()\n",
    "index = vdb.create(index_type=\"hnsw\", dim=1536, space=\"cosine\")\n",
    "\n",
    "# Create vector store\n",
    "vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f45fa43c-8b54-4a75-b7b0-92ac0ac506c6",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac6071d4",
   "metadata": {},
   "source": [
    "## Manage vector store"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edf53787-ebda-4306-afc3-f7d440dcb1ff",
   "metadata": {},
   "source": [
    "### 2.1 Add items to vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17f5efc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.documents import Document\n",
    "\n",
    "document_1 = Document(\n",
    "    page_content=\"ZeusDB is a high-performance vector database\",\n",
    "    metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
    ")\n",
    "\n",
    "document_2 = Document(\n",
    "    page_content=\"Product Quantization reduces memory usage significantly\",\n",
    "    metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
    ")\n",
    "\n",
    "document_3 = Document(\n",
    "    page_content=\"ZeusDB integrates seamlessly with LangChain\",\n",
    "    metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
    ")\n",
    "\n",
    "documents = [document_1, document_2, document_3]\n",
    "\n",
    "vector_store.add_documents(documents=documents, ids=[\"1\", \"2\", \"3\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c738c3e0",
   "metadata": {},
   "source": [
    "### 2.2 Update items in vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0aa8b71",
   "metadata": {},
   "outputs": [],
   "source": [
    "updated_document = Document(\n",
    "    page_content=\"ZeusDB now supports advanced Product Quantization with 4x-256x compression\",\n",
    "    metadata={\"source\": \"https://docs.zeusdb.com\", \"updated\": True},\n",
    ")\n",
    "\n",
    "vector_store.add_documents([updated_document], ids=[\"1\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcf1b905",
   "metadata": {},
   "source": [
    "### 2.3 Delete items from vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef61e188",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store.delete(ids=[\"3\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a0091af-777d-4651-888a-3b346d7990f5",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3620501",
   "metadata": {},
   "source": [
    "## Query vector store"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ba3fdb2-b7d6-4f0f-b8c9-91f63596018b",
   "metadata": {},
   "source": [
    "### 3.1 Query directly"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "400a9b25-9587-4116-ab59-6888602ec2b1",
   "metadata": {},
   "source": [
    "Performing a simple similarity search:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa0a16fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = vector_store.similarity_search(query=\"high performance database\", k=2)\n",
    "\n",
    "for doc in results:\n",
    "    print(f\"* {doc.page_content} [{doc.metadata}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ed9d733",
   "metadata": {},
   "source": [
    "If you want to execute a similarity search and receive the corresponding scores:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5efd2eaa",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = vector_store.similarity_search_with_score(query=\"memory optimization\", k=2)\n",
    "\n",
    "for doc, score in results:\n",
    "    print(f\"* [SIM={score:.3f}] {doc.page_content} [{doc.metadata}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c235cdc",
   "metadata": {},
   "source": [
    "### 3.2 Query by turning into retriever"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59292cb5-5dc8-4158-9137-89d0f6ca711d",
   "metadata": {},
   "source": [
    "You can also transform the vector store into a retriever for easier usage in your chains:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3460093",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 2})\n",
    "\n",
    "retriever.invoke(\"vector database features\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc2d2b63-99d8-45c4-85e6-6a9409551ada",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "persistence_section",
   "metadata": {},
   "source": [
    "## ZeusDB-Specific Features"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "memory_section",
   "metadata": {},
   "source": [
    "### 4.1 Memory-Efficient Setup with Product Quantization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12832d02-d9ea-4c35-a20f-05c85d1d7723",
   "metadata": {},
   "source": [
    "For large datasets, use Product Quantization to reduce memory usage:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "quantization_example",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create memory-optimized vector store\n",
    "quantization_config = {\"type\": \"pq\", \"subvectors\": 8, \"bits\": 8, \"training_size\": 10000}\n",
    "\n",
    "vdb_quantized = VectorDatabase()\n",
    "quantized_index = vdb_quantized.create(\n",
    "    index_type=\"hnsw\", dim=1536, quantization_config=quantization_config\n",
    ")\n",
    "\n",
    "quantized_vector_store = ZeusDBVectorStore(\n",
    "    zeusdb_index=quantized_index, embedding=embeddings\n",
    ")\n",
    "\n",
    "print(f\"Created quantized store: {quantized_index.info()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ffe0613-b2a7-484e-9219-1166b65c49c5",
   "metadata": {},
   "source": [
    "### 4.2 Persistence"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbc323ee-4c6c-43fc-beba-675d820ca078",
   "metadata": {},
   "source": [
    "Save and load your vector store to disk:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "834354d1-55ad-48fe-84e1-a5eacff3f6bb",
   "metadata": {},
   "source": [
    "How to Save your vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9d1332b-a7ac-4a4b-a060-f2061599d3f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save the vector store\n",
    "vector_store.save_index(\"my_zeusdb_index.zdb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23370621-5b51-4313-800f-3a2fb9de52d2",
   "metadata": {},
   "source": [
    "How to Load your vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9ed5778-58e4-4724-b69d-3c7b48cda429",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the vector store\n",
    "loaded_store = ZeusDBVectorStore.load_index(\n",
    "    path=\"my_zeusdb_index.zdb\", embedding=embeddings\n",
    ")\n",
    "\n",
    "print(f\"Loaded store with {loaded_store.get_vector_count()} vectors\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "610cfe63-d4a8-4ef0-88a8-cf9cc3cbbfce",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "901c75dc",
   "metadata": {},
   "source": [
    "## Usage for retrieval-augmented generation\n",
    "\n",
    "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
    "\n",
    "- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
    "- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d9d9d51-3798-410f-b1b3-f9736ea8c238",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25b08eb0-99ab-4919-a201-5243fdfa39e9",
   "metadata": {},
   "source": [
    "## API reference"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77fdca8b-f75e-4100-9f1d-7a017567dc59",
   "metadata": {},
   "source": [
    "For detailed documentation of all ZeusDBVectorStore features and configurations head to the Doc reference: https://docs.zeusdb.com/en/latest/vector_database/integrations/langchain.html"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}