langchain/docs/docs/integrations/stores/bigtable.ipynb

{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "sidebar_label: Google Bigtable\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# BigtableByteStore\n",
    "\n",
    "This guide covers how to use Google Cloud Bigtable as a key-value store.\n",
    "\n",
    "[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. \n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/key_value_store.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "The `BigtableByteStore` uses Google Cloud Bigtable as a backend for a key-value store. It supports synchronous and asynchronous operations for setting, getting, and deleting key-value pairs.\n",
    "\n",
    "### Integration details\n",
    "| Class | Package | Local | JS support | Package downloads | Package latest |\n",
    "| :--- | :--- | :---: | :---: | :---: | :---: |\n",
    "| [BigtableByteStore](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py) | [langchain-google-bigtable](https://pypi.org/project/langchain-google-bigtable/) | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-google-bigtable?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-google-bigtable) |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "### Prerequisites\n",
    "\n",
    "To get started, you will need a Google Cloud project with an active Bigtable instance and table. \n",
    "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
    "* [Enable the Bigtable API](https://console.cloud.google.com/flows/enableapi?apiid=bigtable.googleapis.com)\n",
    "* [Create a Bigtable instance and table](https://cloud.google.com/bigtable/docs/creating-instance)\n",
    "\n",
    "### Installation\n",
    "\n",
    "The integration is in the `langchain-google-bigtable` package. The command below also installs `langchain-google-vertexai` for the embedding cache example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -qU langchain-google-bigtable langchain-google-vertexai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ☁ Set Your Google Cloud Project\n",
    "Set your Google Cloud project to use its resources within this notebook.\n",
    "\n",
    "If you don't know your project ID, you can run `gcloud config list` or see the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# @markdown Please fill in your project, instance, and table details.\n",
    "PROJECT_ID = \"your-gcp-project-id\"  # @param {type:\"string\"}\n",
    "INSTANCE_ID = \"your-instance-id\"  # @param {type:\"string\"}\n",
    "TABLE_ID = \"your-table-id\"  # @param {type:\"string\"}\n",
    "\n",
    "!gcloud config set project {PROJECT_ID}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 🔐 Authentication\n",
    "Authenticate to Google Cloud to access your project resources.\n",
    "- For **Colab**, use the cell below.\n",
    "- For **Vertex AI Workbench**, see the [setup instructions](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.colab import auth\n",
    "\n",
    "auth.authenticate_user()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instantiation\n",
    "\n",
    "To use `BigtableByteStore`, we first ensure a table exists and then initialize a `BigtableEngine` to manage connections."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_bigtable import (\n",
    "    BigtableByteStore,\n",
    "    BigtableEngine,\n",
    "    init_key_value_store_table,\n",
    ")\n",
    "\n",
    "# Ensure the table and column family exist.\n",
    "init_key_value_store_table(\n",
    "    project_id=PROJECT_ID,\n",
    "    instance_id=INSTANCE_ID,\n",
    "    table_id=TABLE_ID,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### BigtableEngine\n",
    "A `BigtableEngine` object handles the execution context for the store, especially for async operations. It's recommended to initialize a single engine and reuse it across multiple stores for better performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the engine to manage async operations.\n",
    "engine = await BigtableEngine.async_initialize(\n",
    "    project_id=PROJECT_ID, instance_id=INSTANCE_ID\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### BigtableByteStore\n",
    "\n",
    "This is the main class for interacting with the key-value store. It provides the methods for setting, getting, and deleting data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the store.\n",
    "store = await BigtableByteStore.create(engine=engine, table_id=TABLE_ID)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usage\n",
    "\n",
    "The store supports both sync (`mset`, `mget`) and async (`amset`, `amget`) methods. This guide uses the async versions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set\n",
    "Use `amset` to save key-value pairs to the store."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kv_pairs = [\n",
    "    (\"key1\", b\"value1\"),\n",
    "    (\"key2\", b\"value2\"),\n",
    "    (\"key3\", b\"value3\"),\n",
    "]\n",
    "\n",
    "await store.amset(kv_pairs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get\n",
    "Use `amget` to retrieve values. If a key is not found, `None` is returned for that key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "retrieved_vals = await store.amget([\"key1\", \"key2\", \"nonexistent_key\"])\n",
    "print(retrieved_vals)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete\n",
    "Use `amdelete` to remove keys from the store."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "await store.amdelete([\"key3\"])\n",
    "\n",
    "# Verifying the key was deleted\n",
    "await store.amget([\"key1\", \"key3\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Iterate over keys\n",
    "Use `ayield_keys` to iterate over all keys or keys with a specific prefix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_keys = [key async for key in store.ayield_keys()]\n",
    "print(f\"All keys: {all_keys}\")\n",
    "\n",
    "prefixed_keys = [key async for key in store.ayield_keys(prefix=\"key1\")]\n",
    "print(f\"Prefixed keys: {prefixed_keys}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced Usage: Embedding Caching\n",
    "\n",
    "A common use case for a key-value store is to cache expensive operations like computing text embeddings, which saves time and cost."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings import CacheBackedEmbeddings\n",
    "from langchain_google_vertexai.embeddings import VertexAIEmbeddings\n",
    "\n",
    "underlying_embeddings = VertexAIEmbeddings(\n",
    "    project=PROJECT_ID, model_name=\"textembedding-gecko@003\"\n",
    ")\n",
    "\n",
    "# Use a namespace to avoid key collisions with other data.\n",
    "cached_embedder = CacheBackedEmbeddings.from_bytes_store(\n",
    "    underlying_embeddings, store, namespace=\"text-embeddings\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"First call (computes and caches embedding):\")\n",
    "%time embedding_result_1 = await cached_embedder.aembed_query(\"Hello, world!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nSecond call (retrieves from cache):\")\n",
    "%time embedding_result_2 = await cached_embedder.aembed_query(\"Hello, world!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### As a Simple Document Retriever\n",
    "\n",
    "This section shows how to create a simple retriever using the Bigtable store. It acts as a document persistence layer, fetching documents that match a query prefix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.retrievers import BaseRetriever\n",
    "from langchain_core.documents import Document\n",
    "from langchain_core.callbacks import CallbackManagerForRetrieverRun\n",
    "from typing import List, Optional, Any, Union\n",
    "import json\n",
    "\n",
    "\n",
    "class SimpleKVStoreRetriever(BaseRetriever):\n",
    "    \"\"\"A simple retriever that retrieves documents based on a prefix match in the key-value store.\"\"\"\n",
    "\n",
    "    store: BigtableByteStore\n",
    "    documents: List[Union[Document, str]]\n",
    "    k: int\n",
    "\n",
    "    def set_up_store(self):\n",
    "        kv_pairs_to_set = []\n",
    "        for i, doc in enumerate(self.documents):\n",
    "            if isinstance(doc, str):\n",
    "                doc = Document(page_content=doc)\n",
    "            if not doc.id:\n",
    "                doc.id = str(i)\n",
    "            value = (\n",
    "                \"Page Content\\n\"\n",
    "                + doc.page_content\n",
    "                + \"\\nMetadata\"\n",
    "                + json.dumps(doc.metadata)\n",
    "            )\n",
    "            kv_pairs_to_set.append((doc.id, value.encode(\"utf-8\")))\n",
    "        self.store.mset(kv_pairs_to_set)\n",
    "\n",
    "    async def _aget_relevant_documents(\n",
    "        self,\n",
    "        query: str,\n",
    "        *,\n",
    "        run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
    "    ) -> List[Document]:\n",
    "        keys = [key async for key in self.store.ayield_keys(prefix=query)][: self.k]\n",
    "        documents_retrieved = []\n",
    "        async for document in await self.store.amget(keys):\n",
    "            if document:\n",
    "                document_str = document.decode(\"utf-8\")\n",
    "                page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
    "                metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
    "                documents_retrieved.append(\n",
    "                    Document(page_content=page_content, metadata=metadata)\n",
    "                )\n",
    "        return documents_retrieved\n",
    "\n",
    "    def _get_relevant_documents(\n",
    "        self,\n",
    "        query: str,\n",
    "        *,\n",
    "        run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
    "    ) -> list[Document]:\n",
    "        keys = [key for key in self.store.yield_keys(prefix=query)][: self.k]\n",
    "        documents_retrieved = []\n",
    "        for document in self.store.mget(keys):\n",
    "            if document:\n",
    "                document_str = document.decode(\"utf-8\")\n",
    "                page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
    "                metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
    "                documents_retrieved.append(\n",
    "                    Document(page_content=page_content, metadata=metadata)\n",
    "                )\n",
    "        return documents_retrieved"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = [\n",
    "    Document(\n",
    "        page_content=\"Goldfish are popular pets for beginners, requiring relatively simple care.\",\n",
    "        metadata={\"type\": \"fish\", \"trait\": \"low maintenance\"},\n",
    "        id=\"fish#Goldfish\",\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"Cats are independent pets that often enjoy their own space.\",\n",
    "        metadata={\"type\": \"cat\", \"trait\": \"independence\"},\n",
    "        id=\"mammals#Cats\",\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"Rabbits are social animals that need plenty of space to hop around.\",\n",
    "        metadata={\"type\": \"rabbit\", \"trait\": \"social\"},\n",
    "        id=\"mammals#Rabbits\",\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever_store = BigtableByteStore.create_sync(\n",
    "    engine=engine, instance_id=INSTANCE_ID, table_id=TABLE_ID\n",
    ")\n",
    "\n",
    "KVDocumentRetriever = SimpleKVStoreRetriever(\n",
    "    store=retriever_store, documents=documents, k=2\n",
    ")\n",
    "\n",
    "KVDocumentRetriever.set_up_store()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "KVDocumentRetriever.invoke(\"fish\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "KVDocumentRetriever.invoke(\"mammals\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## API reference\n",
    "\n",
    "For full details on the `BigtableByteStore` class, see the source code on [GitHub](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}