Files
langchain/docs/docs/integrations/stores/bigtable.ipynb
Michael Yilma 03f0ebd93e docs: add Bigtable Key-value Store and Vector Store Docs (#32598)
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**

- [x] **feat(docs)**: add Bigtable Key-value store doc
- [X] **feat(docs)**: add Bigtable Vector store doc 

This PR adds a doc for Bigtable and LangChain Key-value store
integration. It contains guides on how to add, delete, get, and yield
key-value pairs from Bigtable Key-value Store for LangChain.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.

Additional guidelines:

- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-12 16:53:59 -04:00

484 lines
15 KiB
Plaintext

{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Google Bigtable\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BigtableByteStore\n",
"\n",
"This guide covers how to use Google Cloud Bigtable as a key-value store.\n",
"\n",
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. \n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/key_value_store.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview\n",
"\n",
"The `BigtableByteStore` uses Google Cloud Bigtable as a backend for a key-value store. It supports synchronous and asynchronous operations for setting, getting, and deleting key-value pairs.\n",
"\n",
"### Integration details\n",
"| Class | Package | Local | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: |\n",
"| [BigtableByteStore](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py) | [langchain-google-bigtable](https://pypi.org/project/langchain-google-bigtable/) | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-google-bigtable?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-google-bigtable) |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"### Prerequisites\n",
"\n",
"To get started, you will need a Google Cloud project with an active Bigtable instance and table. \n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Enable the Bigtable API](https://console.cloud.google.com/flows/enableapi?apiid=bigtable.googleapis.com)\n",
"* [Create a Bigtable instance and table](https://cloud.google.com/bigtable/docs/creating-instance)\n",
"\n",
"### Installation\n",
"\n",
"The integration is in the `langchain-google-bigtable` package. The command below also installs `langchain-google-vertexai` for the embedding cache example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-google-bigtable langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project to use its resources within this notebook.\n",
"\n",
"If you don't know your project ID, you can run `gcloud config list` or see the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in your project, instance, and table details.\n",
"PROJECT_ID = \"your-gcp-project-id\" # @param {type:\"string\"}\n",
"INSTANCE_ID = \"your-instance-id\" # @param {type:\"string\"}\n",
"TABLE_ID = \"your-table-id\" # @param {type:\"string\"}\n",
"\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud to access your project resources.\n",
"- For **Colab**, use the cell below.\n",
"- For **Vertex AI Workbench**, see the [setup instructions](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"To use `BigtableByteStore`, we first ensure a table exists and then initialize a `BigtableEngine` to manage connections."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_bigtable import (\n",
" BigtableByteStore,\n",
" BigtableEngine,\n",
" init_key_value_store_table,\n",
")\n",
"\n",
"# Ensure the table and column family exist.\n",
"init_key_value_store_table(\n",
" project_id=PROJECT_ID,\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### BigtableEngine\n",
"A `BigtableEngine` object handles the execution context for the store, especially for async operations. It's recommended to initialize a single engine and reuse it across multiple stores for better performance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize the engine to manage async operations.\n",
"engine = await BigtableEngine.async_initialize(\n",
" project_id=PROJECT_ID, instance_id=INSTANCE_ID\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### BigtableByteStore\n",
"\n",
"This is the main class for interacting with the key-value store. It provides the methods for setting, getting, and deleting data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize the store.\n",
"store = await BigtableByteStore.create(engine=engine, table_id=TABLE_ID)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"The store supports both sync (`mset`, `mget`) and async (`amset`, `amget`) methods. This guide uses the async versions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set\n",
"Use `amset` to save key-value pairs to the store."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kv_pairs = [\n",
" (\"key1\", b\"value1\"),\n",
" (\"key2\", b\"value2\"),\n",
" (\"key3\", b\"value3\"),\n",
"]\n",
"\n",
"await store.amset(kv_pairs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get\n",
"Use `amget` to retrieve values. If a key is not found, `None` is returned for that key."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retrieved_vals = await store.amget([\"key1\", \"key2\", \"nonexistent_key\"])\n",
"print(retrieved_vals)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete\n",
"Use `amdelete` to remove keys from the store."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await store.amdelete([\"key3\"])\n",
"\n",
"# Verifying the key was deleted\n",
"await store.amget([\"key1\", \"key3\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterate over keys\n",
"Use `ayield_keys` to iterate over all keys or keys with a specific prefix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"all_keys = [key async for key in store.ayield_keys()]\n",
"print(f\"All keys: {all_keys}\")\n",
"\n",
"prefixed_keys = [key async for key in store.ayield_keys(prefix=\"key1\")]\n",
"print(f\"Prefixed keys: {prefixed_keys}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage: Embedding Caching\n",
"\n",
"A common use case for a key-value store is to cache expensive operations like computing text embeddings, which saves time and cost."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import CacheBackedEmbeddings\n",
"from langchain_google_vertexai.embeddings import VertexAIEmbeddings\n",
"\n",
"underlying_embeddings = VertexAIEmbeddings(\n",
" project=PROJECT_ID, model_name=\"textembedding-gecko@003\"\n",
")\n",
"\n",
"# Use a namespace to avoid key collisions with other data.\n",
"cached_embedder = CacheBackedEmbeddings.from_bytes_store(\n",
" underlying_embeddings, store, namespace=\"text-embeddings\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"First call (computes and caches embedding):\")\n",
"%time embedding_result_1 = await cached_embedder.aembed_query(\"Hello, world!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"\\nSecond call (retrieves from cache):\")\n",
"%time embedding_result_2 = await cached_embedder.aembed_query(\"Hello, world!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### As a Simple Document Retriever\n",
"\n",
"This section shows how to create a simple retriever using the Bigtable store. It acts as a document persistence layer, fetching documents that match a query prefix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.retrievers import BaseRetriever\n",
"from langchain_core.documents import Document\n",
"from langchain_core.callbacks import CallbackManagerForRetrieverRun\n",
"from typing import List, Optional, Any, Union\n",
"import json\n",
"\n",
"\n",
"class SimpleKVStoreRetriever(BaseRetriever):\n",
" \"\"\"A simple retriever that retrieves documents based on a prefix match in the key-value store.\"\"\"\n",
"\n",
" store: BigtableByteStore\n",
" documents: List[Union[Document, str]]\n",
" k: int\n",
"\n",
" def set_up_store(self):\n",
" kv_pairs_to_set = []\n",
" for i, doc in enumerate(self.documents):\n",
" if isinstance(doc, str):\n",
" doc = Document(page_content=doc)\n",
" if not doc.id:\n",
" doc.id = str(i)\n",
" value = (\n",
" \"Page Content\\n\"\n",
" + doc.page_content\n",
" + \"\\nMetadata\"\n",
" + json.dumps(doc.metadata)\n",
" )\n",
" kv_pairs_to_set.append((doc.id, value.encode(\"utf-8\")))\n",
" self.store.mset(kv_pairs_to_set)\n",
"\n",
" async def _aget_relevant_documents(\n",
" self,\n",
" query: str,\n",
" *,\n",
" run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
" ) -> List[Document]:\n",
" keys = [key async for key in self.store.ayield_keys(prefix=query)][: self.k]\n",
" documents_retrieved = []\n",
" async for document in await self.store.amget(keys):\n",
" if document:\n",
" document_str = document.decode(\"utf-8\")\n",
" page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
" metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
" documents_retrieved.append(\n",
" Document(page_content=page_content, metadata=metadata)\n",
" )\n",
" return documents_retrieved\n",
"\n",
" def _get_relevant_documents(\n",
" self,\n",
" query: str,\n",
" *,\n",
" run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
" ) -> list[Document]:\n",
" keys = [key for key in self.store.yield_keys(prefix=query)][: self.k]\n",
" documents_retrieved = []\n",
" for document in self.store.mget(keys):\n",
" if document:\n",
" document_str = document.decode(\"utf-8\")\n",
" page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
" metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
" documents_retrieved.append(\n",
" Document(page_content=page_content, metadata=metadata)\n",
" )\n",
" return documents_retrieved"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"documents = [\n",
" Document(\n",
" page_content=\"Goldfish are popular pets for beginners, requiring relatively simple care.\",\n",
" metadata={\"type\": \"fish\", \"trait\": \"low maintenance\"},\n",
" id=\"fish#Goldfish\",\n",
" ),\n",
" Document(\n",
" page_content=\"Cats are independent pets that often enjoy their own space.\",\n",
" metadata={\"type\": \"cat\", \"trait\": \"independence\"},\n",
" id=\"mammals#Cats\",\n",
" ),\n",
" Document(\n",
" page_content=\"Rabbits are social animals that need plenty of space to hop around.\",\n",
" metadata={\"type\": \"rabbit\", \"trait\": \"social\"},\n",
" id=\"mammals#Rabbits\",\n",
" ),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever_store = BigtableByteStore.create_sync(\n",
" engine=engine, instance_id=INSTANCE_ID, table_id=TABLE_ID\n",
")\n",
"\n",
"KVDocumentRetriever = SimpleKVStoreRetriever(\n",
" store=retriever_store, documents=documents, k=2\n",
")\n",
"\n",
"KVDocumentRetriever.set_up_store()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"KVDocumentRetriever.invoke(\"fish\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"KVDocumentRetriever.invoke(\"mammals\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For full details on the `BigtableByteStore` class, see the source code on [GitHub](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}