docs: add Bigtable Key-value Store and Vector Store Docs (#32598)

Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**

- [x] **feat(docs)**: add Bigtable Key-value store doc
- [X] **feat(docs)**: add Bigtable Vector store doc 

This PR adds a doc for Bigtable and LangChain Key-value store
integration. It contains guides on how to add, delete, get, and yield
key-value pairs from Bigtable Key-value Store for LangChain.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.

Additional guidelines:

- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
This commit is contained in:
Michael Yilma
2025-09-12 16:53:59 -04:00
committed by GitHub
parent c9eed530ce
commit 03f0ebd93e
4 changed files with 1454 additions and 0 deletions

View File

@@ -0,0 +1,129 @@
# Bigtable
Bigtable is a scalable, fully managed key-value and wide-column store ideal for fast access to structured, semi-structured, or unstructured data. This page provides an overview of Bigtable's LangChain integrations.
**Client Library Documentation:** [cloud.google.com/python/docs/reference/langchain-google-bigtable/latest](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest)
**Product Documentation:** [cloud.google.com/bigtable](https://cloud.google.com/bigtable)
## Quick Start
To use this library, you first need to:
1. Select or create a Cloud Platform project.
2. Enable billing for your project.
3. Enable the Google Cloud Bigtable API.
4. Set up Authentication.
## Installation
The main package for this integration is `langchain-google-bigtable`.
```bash
pip install -U langchain-google-bigtable
```
## Integrations
The `langchain-google-bigtable` package provides the following integrations:
### Vector Store
With `BigtableVectorStore`, you can store documents and their vector embeddings to find the most similar or relevant information in your database.
* **Full `VectorStore` Implementation:** Supports all methods from the LangChain `VectorStore` abstract class.
* **Async/Sync Support:** All methods are available in both asynchronous and synchronous versions.
* **Metadata Filtering:** Powerful filtering on metadata fields, including logical AND/OR combinations.
* **Multiple Distance Strategies:** Supports both Cosine and Euclidean distance for similarity search.
* **Customizable Storage:** Full control over how content, embeddings, and metadata are stored in Bigtable columns.
```python
from langchain_google_bigtable import BigtableVectorStore
# Your embedding service and other configurations
# embedding_service = ...
engine = await BigtableEngine.async_initialize(project_id="your-project-id")
vector_store = await BigtableVectorStore.create(
engine=engine,
instance_id="your-instance-id",
table_id="your-table-id",
embedding_service=embedding_service,
collection="your_collection_name",
)
await vector_store.aadd_documents([your_documents])
results = await vector_store.asimilarity_search("your query")
```
Learn more in the [Vector Store how-to guide](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/vector_store.ipynb).
### Key-value Store
Use `BigtableByteStore` as a persistent, scalable key-value store for caching, session management, or other storage needs. It supports both synchronous and asynchronous operations.
```python
from langchain_google_bigtable import BigtableByteStore
# Initialize the store
store = await BigtableByteStore.create(
project_id="your-project-id",
instance_id="your-instance-id",
table_id="your-table-id",
)
# Set and get values
await store.amset([("key1", b"value1")])
retrieved = await store.amget(["key1"])
```
Learn more in the [Key-value Store how-to guide](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest/key-value-store).
### Document Loader
Use the `BigtableLoader` to load data from a Bigtable table and represent it as LangChain `Document` objects.
```python
from langchain_google_bigtable import BigtableLoader
loader = BigtableLoader(
project_id="your-project-id",
instance_id="your-instance-id",
table_id="your-table-name"
)
docs = loader.load()
```
Learn more in the [Document Loader how-to guide](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest/document-loader).
### Chat Message History
Use `BigtableChatMessageHistory` to store conversation histories, enabling stateful chains and agents.
```python
from langchain_google_bigtable import BigtableChatMessageHistory
history = BigtableChatMessageHistory(
project_id="your-project-id",
instance_id="your-instance-id",
table_id="your-message-store",
session_id="user-session-123"
)
history.add_user_message("Hello!")
history.add_ai_message("Hi there!")
```
Learn more in the [Chat Message History how-to guide](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest/chat-message-history).
## Contributions
Contributions to this library are welcome. Please see the CONTRIBUTING guide in the [package repo](https://github.com/googleapis/langchain-google-bigtable-python/) for more details
## License
This project is licensed under the Apache 2.0 License - see the LICENSE file in the [package repo](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/LICENSE) for details.
## Disclaimer
This is not an officially supported Google product.

View File

@@ -0,0 +1,483 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Google Bigtable\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BigtableByteStore\n",
"\n",
"This guide covers how to use Google Cloud Bigtable as a key-value store.\n",
"\n",
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. \n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/key_value_store.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview\n",
"\n",
"The `BigtableByteStore` uses Google Cloud Bigtable as a backend for a key-value store. It supports synchronous and asynchronous operations for setting, getting, and deleting key-value pairs.\n",
"\n",
"### Integration details\n",
"| Class | Package | Local | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: |\n",
"| [BigtableByteStore](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py) | [langchain-google-bigtable](https://pypi.org/project/langchain-google-bigtable/) | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-google-bigtable?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-google-bigtable) |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"### Prerequisites\n",
"\n",
"To get started, you will need a Google Cloud project with an active Bigtable instance and table. \n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Enable the Bigtable API](https://console.cloud.google.com/flows/enableapi?apiid=bigtable.googleapis.com)\n",
"* [Create a Bigtable instance and table](https://cloud.google.com/bigtable/docs/creating-instance)\n",
"\n",
"### Installation\n",
"\n",
"The integration is in the `langchain-google-bigtable` package. The command below also installs `langchain-google-vertexai` for the embedding cache example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-google-bigtable langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project to use its resources within this notebook.\n",
"\n",
"If you don't know your project ID, you can run `gcloud config list` or see the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in your project, instance, and table details.\n",
"PROJECT_ID = \"your-gcp-project-id\" # @param {type:\"string\"}\n",
"INSTANCE_ID = \"your-instance-id\" # @param {type:\"string\"}\n",
"TABLE_ID = \"your-table-id\" # @param {type:\"string\"}\n",
"\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud to access your project resources.\n",
"- For **Colab**, use the cell below.\n",
"- For **Vertex AI Workbench**, see the [setup instructions](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"To use `BigtableByteStore`, we first ensure a table exists and then initialize a `BigtableEngine` to manage connections."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_bigtable import (\n",
" BigtableByteStore,\n",
" BigtableEngine,\n",
" init_key_value_store_table,\n",
")\n",
"\n",
"# Ensure the table and column family exist.\n",
"init_key_value_store_table(\n",
" project_id=PROJECT_ID,\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### BigtableEngine\n",
"A `BigtableEngine` object handles the execution context for the store, especially for async operations. It's recommended to initialize a single engine and reuse it across multiple stores for better performance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize the engine to manage async operations.\n",
"engine = await BigtableEngine.async_initialize(\n",
" project_id=PROJECT_ID, instance_id=INSTANCE_ID\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### BigtableByteStore\n",
"\n",
"This is the main class for interacting with the key-value store. It provides the methods for setting, getting, and deleting data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize the store.\n",
"store = await BigtableByteStore.create(engine=engine, table_id=TABLE_ID)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"The store supports both sync (`mset`, `mget`) and async (`amset`, `amget`) methods. This guide uses the async versions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set\n",
"Use `amset` to save key-value pairs to the store."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kv_pairs = [\n",
" (\"key1\", b\"value1\"),\n",
" (\"key2\", b\"value2\"),\n",
" (\"key3\", b\"value3\"),\n",
"]\n",
"\n",
"await store.amset(kv_pairs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get\n",
"Use `amget` to retrieve values. If a key is not found, `None` is returned for that key."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retrieved_vals = await store.amget([\"key1\", \"key2\", \"nonexistent_key\"])\n",
"print(retrieved_vals)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete\n",
"Use `amdelete` to remove keys from the store."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await store.amdelete([\"key3\"])\n",
"\n",
"# Verifying the key was deleted\n",
"await store.amget([\"key1\", \"key3\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterate over keys\n",
"Use `ayield_keys` to iterate over all keys or keys with a specific prefix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"all_keys = [key async for key in store.ayield_keys()]\n",
"print(f\"All keys: {all_keys}\")\n",
"\n",
"prefixed_keys = [key async for key in store.ayield_keys(prefix=\"key1\")]\n",
"print(f\"Prefixed keys: {prefixed_keys}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage: Embedding Caching\n",
"\n",
"A common use case for a key-value store is to cache expensive operations like computing text embeddings, which saves time and cost."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import CacheBackedEmbeddings\n",
"from langchain_google_vertexai.embeddings import VertexAIEmbeddings\n",
"\n",
"underlying_embeddings = VertexAIEmbeddings(\n",
" project=PROJECT_ID, model_name=\"textembedding-gecko@003\"\n",
")\n",
"\n",
"# Use a namespace to avoid key collisions with other data.\n",
"cached_embedder = CacheBackedEmbeddings.from_bytes_store(\n",
" underlying_embeddings, store, namespace=\"text-embeddings\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"First call (computes and caches embedding):\")\n",
"%time embedding_result_1 = await cached_embedder.aembed_query(\"Hello, world!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"\\nSecond call (retrieves from cache):\")\n",
"%time embedding_result_2 = await cached_embedder.aembed_query(\"Hello, world!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### As a Simple Document Retriever\n",
"\n",
"This section shows how to create a simple retriever using the Bigtable store. It acts as a document persistence layer, fetching documents that match a query prefix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.retrievers import BaseRetriever\n",
"from langchain_core.documents import Document\n",
"from langchain_core.callbacks import CallbackManagerForRetrieverRun\n",
"from typing import List, Optional, Any, Union\n",
"import json\n",
"\n",
"\n",
"class SimpleKVStoreRetriever(BaseRetriever):\n",
" \"\"\"A simple retriever that retrieves documents based on a prefix match in the key-value store.\"\"\"\n",
"\n",
" store: BigtableByteStore\n",
" documents: List[Union[Document, str]]\n",
" k: int\n",
"\n",
" def set_up_store(self):\n",
" kv_pairs_to_set = []\n",
" for i, doc in enumerate(self.documents):\n",
" if isinstance(doc, str):\n",
" doc = Document(page_content=doc)\n",
" if not doc.id:\n",
" doc.id = str(i)\n",
" value = (\n",
" \"Page Content\\n\"\n",
" + doc.page_content\n",
" + \"\\nMetadata\"\n",
" + json.dumps(doc.metadata)\n",
" )\n",
" kv_pairs_to_set.append((doc.id, value.encode(\"utf-8\")))\n",
" self.store.mset(kv_pairs_to_set)\n",
"\n",
" async def _aget_relevant_documents(\n",
" self,\n",
" query: str,\n",
" *,\n",
" run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
" ) -> List[Document]:\n",
" keys = [key async for key in self.store.ayield_keys(prefix=query)][: self.k]\n",
" documents_retrieved = []\n",
" async for document in await self.store.amget(keys):\n",
" if document:\n",
" document_str = document.decode(\"utf-8\")\n",
" page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
" metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
" documents_retrieved.append(\n",
" Document(page_content=page_content, metadata=metadata)\n",
" )\n",
" return documents_retrieved\n",
"\n",
" def _get_relevant_documents(\n",
" self,\n",
" query: str,\n",
" *,\n",
" run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
" ) -> list[Document]:\n",
" keys = [key for key in self.store.yield_keys(prefix=query)][: self.k]\n",
" documents_retrieved = []\n",
" for document in self.store.mget(keys):\n",
" if document:\n",
" document_str = document.decode(\"utf-8\")\n",
" page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
" metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
" documents_retrieved.append(\n",
" Document(page_content=page_content, metadata=metadata)\n",
" )\n",
" return documents_retrieved"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"documents = [\n",
" Document(\n",
" page_content=\"Goldfish are popular pets for beginners, requiring relatively simple care.\",\n",
" metadata={\"type\": \"fish\", \"trait\": \"low maintenance\"},\n",
" id=\"fish#Goldfish\",\n",
" ),\n",
" Document(\n",
" page_content=\"Cats are independent pets that often enjoy their own space.\",\n",
" metadata={\"type\": \"cat\", \"trait\": \"independence\"},\n",
" id=\"mammals#Cats\",\n",
" ),\n",
" Document(\n",
" page_content=\"Rabbits are social animals that need plenty of space to hop around.\",\n",
" metadata={\"type\": \"rabbit\", \"trait\": \"social\"},\n",
" id=\"mammals#Rabbits\",\n",
" ),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever_store = BigtableByteStore.create_sync(\n",
" engine=engine, instance_id=INSTANCE_ID, table_id=TABLE_ID\n",
")\n",
"\n",
"KVDocumentRetriever = SimpleKVStoreRetriever(\n",
" store=retriever_store, documents=documents, k=2\n",
")\n",
"\n",
"KVDocumentRetriever.set_up_store()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"KVDocumentRetriever.invoke(\"fish\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"KVDocumentRetriever.invoke(\"mammals\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For full details on the `BigtableByteStore` class, see the source code on [GitHub](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,839 @@
{
"cells": [
{
"cell_type": "raw",
"id": "7fb27b941602401d91542211134fc71a",
"metadata": {
"id": "7fb27b941602401d91542211134fc71a"
},
"source": [
"---\n",
"sidebar_label: Google Bigtable\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "acae54e37e7d407bbb7b55eff062a284",
"metadata": {
"id": "acae54e37e7d407bbb7b55eff062a284"
},
"source": [
"# BigtableVectorStore\n",
"\n",
"This guide covers the `BigtableVectorStore` integration for using Google Cloud Bigtable as a vector store.\n",
"\n",
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data.\n"
]
},
{
"cell_type": "markdown",
"id": "9a63283cbaf04dbcab1f6479b197f3a8",
"metadata": {
"id": "9a63283cbaf04dbcab1f6479b197f3a8"
},
"source": [
"## Overview\n",
"\n",
"The `BigtableVectorStore` uses Google Cloud Bigtable to store documents and their vector embeddings for similarity search and retrieval. It supports powerful metadata filtering to refine search results.\n",
"\n",
"### Integration details\n",
"| Class | Package | Local | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: |\n",
"| [BigtableVectorStore](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/vector_store.py) | [langchain-google-bigtable](https://pypi.org/project/langchain-google-bigtable/) | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-google-bigtable?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-google-bigtable) |"
]
},
{
"cell_type": "markdown",
"id": "8dd0d8092fe74a7c96281538738b07e2",
"metadata": {
"id": "8dd0d8092fe74a7c96281538738b07e2"
},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"id": "72eea5119410473aa328ad9291626812",
"metadata": {
"id": "72eea5119410473aa328ad9291626812"
},
"source": [
"### Prerequisites\n",
"\n",
"To get started, you will need a Google Cloud project with an active Bigtable instance.\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Enable the Bigtable API](https://console.cloud.google.com/flows/enableapi?apiid=bigtable.googleapis.com)\n",
"* [Create a Bigtable instance](https://cloud.google.com/bigtable/docs/creating-instance)\n",
"\n",
"### Installation\n",
"\n",
"The integration is in the `langchain-google-bigtable` package. The command below also installs `langchain-google-vertexai` to use for an embedding service."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8edb47106e1a46a883d545849b8ab81b",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8edb47106e1a46a883d545849b8ab81b",
"outputId": "b6c95f84-f271-4bd0-f024-81ea38ce7f80"
},
"outputs": [],
"source": [
"%pip install -qU langchain-google-bigtable langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"id": "WEparXIIO41L",
"metadata": {
"id": "WEparXIIO41L"
},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "OB8Mg8HxO9HV",
"metadata": {
"id": "OB8Mg8HxO9HV"
},
"outputs": [],
"source": [
"# Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"id": "10185d26023b46108eb7d9f57d49d2b3",
"metadata": {
"id": "10185d26023b46108eb7d9f57d49d2b3"
},
"source": [
"### Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8763a12b2bbd4a93a75aff182afb95dc",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8763a12b2bbd4a93a75aff182afb95dc",
"outputId": "865ca13d-47e1-4458-dfe3-96b0e7a57810"
},
"outputs": [],
"source": [
"# @markdown Please fill in your project, instance, and a new table name.\n",
"PROJECT_ID = \"google.com:cloud-bigtable-dev\" # @param {type:\"string\"}\n",
"INSTANCE_ID = \"anweshadas-test\" # @param {type:\"string\"}\n",
"TABLE_ID = \"your-vector-store-table-3\" # @param {type:\"string\"}\n",
"\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"id": "xx0JMrbNOfnV",
"metadata": {
"id": "xx0JMrbNOfnV"
},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "T1pPsDCzOURd",
"metadata": {
"id": "T1pPsDCzOURd"
},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user(project_id=PROJECT_ID)"
]
},
{
"cell_type": "markdown",
"id": "7623eae2785240b9bd12b16a66d81610",
"metadata": {
"id": "7623eae2785240b9bd12b16a66d81610"
},
"source": [
"## Initialization\n",
"\n",
"Initializing the `BigtableVectorStore` involves three steps: setting up the embedding service, ensuring the Bigtable table is created, and configuring the store's parameters."
]
},
{
"cell_type": "markdown",
"id": "7cdc8c89c7104fffa095e18ddfef8986",
"metadata": {
"id": "7cdc8c89c7104fffa095e18ddfef8986"
},
"source": [
"### 1. Set up Embedding Service\n",
"First, we need a model to create the vector embeddings for our documents. We'll use a Vertex AI model for this example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b118ea5561624da68c537baed56e602f",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "b118ea5561624da68c537baed56e602f",
"outputId": "99b55b9a-61c7-4dbe-bf1f-dd84ddc434da"
},
"outputs": [],
"source": [
"from langchain_google_vertexai import VertexAIEmbeddings\n",
"\n",
"embeddings = VertexAIEmbeddings(project=PROJECT_ID, model_name=\"gemini-embedding-001\")"
]
},
{
"cell_type": "markdown",
"id": "938c804e27f84196a10c8828c723f798",
"metadata": {
"id": "938c804e27f84196a10c8828c723f798"
},
"source": [
"### 2. Initialize a Table\n",
"Before creating a `BigtableVectorStore`, a table with the correct column families must exist. The `init_vector_store_table` helper function is the recommended way to create and configure a table. If the table already exists, it will do nothing."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "504fb2a444614c0babb325280ed9130a",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "504fb2a444614c0babb325280ed9130a",
"outputId": "2e6453bc-5eed-4a3e-a8d4-59e945485be6"
},
"outputs": [],
"source": [
"from langchain_google_bigtable.vector_store import init_vector_store_table\n",
"\n",
"DATA_COLUMN_FAMILY = \"doc_data\"\n",
"\n",
"try:\n",
" init_vector_store_table(\n",
" project_id=PROJECT_ID,\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
" content_column_family=DATA_COLUMN_FAMILY,\n",
" embedding_column_family=DATA_COLUMN_FAMILY,\n",
" )\n",
" print(f\"Table '{TABLE_ID}' is ready.\")\n",
"except ValueError as e:\n",
" print(e)"
]
},
{
"cell_type": "markdown",
"id": "59bbdb311c014d738909a11f9e486628",
"metadata": {
"id": "59bbdb311c014d738909a11f9e486628"
},
"source": [
"### 3. Configure the Vector Store\n",
"Now we define the parameters that control how the vector store connects to Bigtable and how it handles data."
]
},
{
"cell_type": "markdown",
"id": "b43b363d81ae4b689946ece5c682cd59",
"metadata": {
"id": "b43b363d81ae4b689946ece5c682cd59"
},
"source": [
"#### The BigtableEngine\n",
"A `BigtableEngine` object manages clients and async operations. It is highly recommended to initialize a single engine and reuse it across multiple stores for better performance and resource management."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a65eabff63a45729fe45fb5ade58bdc",
"metadata": {
"id": "8a65eabff63a45729fe45fb5ade58bdc"
},
"outputs": [],
"source": [
"from langchain_google_bigtable import BigtableEngine\n",
"\n",
"engine = await BigtableEngine.async_initialize(project_id=PROJECT_ID)"
]
},
{
"cell_type": "markdown",
"id": "c3933fab20d04ec698c2621248eb3be0",
"metadata": {
"id": "c3933fab20d04ec698c2621248eb3be0"
},
"source": [
"#### Collections\n",
"A `collection` provides a logical namespace for your documents within a single Bigtable table. It is used as a prefix for the row keys, allowing multiple vector stores to coexist in the same table without interfering with each other."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4dd4641cc4064e0191573fe9c69df29b",
"metadata": {
"id": "4dd4641cc4064e0191573fe9c69df29b"
},
"outputs": [],
"source": [
"collection_name = \"my_docs\""
]
},
{
"cell_type": "markdown",
"id": "8309879909854d7188b41380fd92a7c3",
"metadata": {
"id": "8309879909854d7188b41380fd92a7c3"
},
"source": [
"#### Metadata Configuration\n",
"When creating a `BigtableVectorStore`, you have two optional parameters for handling metadata:\n",
"\n",
"* `metadata_mappings`: This is a list of `VectorMetadataMapping` objects. You **must** define a mapping for any metadata key you wish to use for filtering in your search queries. Each mapping specifies the data type (`encoding`) for the metadata field, which is crucial for correct filtering.\n",
"* `metadata_as_json_column`: This is an optional `ColumnConfig` that tells the store to save the *entire* metadata dictionary as a single JSON string in a specific column. This is useful for efficiently retrieving all of a document's metadata at once, including fields not defined in `metadata_mappings`. **Note:** Fields stored only in this JSON column cannot be used for filtering."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ed186c9a28b402fb0bc4494df01f08d",
"metadata": {
"id": "3ed186c9a28b402fb0bc4494df01f08d"
},
"outputs": [],
"source": [
"from langchain_google_bigtable import ColumnConfig, VectorMetadataMapping, Encoding\n",
"\n",
"# Define mappings for metadata fields you want to filter on.\n",
"metadata_mappings = [\n",
" VectorMetadataMapping(metadata_key=\"author\", encoding=Encoding.UTF8),\n",
" VectorMetadataMapping(metadata_key=\"year\", encoding=Encoding.INT_BIG_ENDIAN),\n",
" VectorMetadataMapping(metadata_key=\"category\", encoding=Encoding.UTF8),\n",
" VectorMetadataMapping(metadata_key=\"rating\", encoding=Encoding.FLOAT),\n",
"]\n",
"\n",
"# Define the optional column for storing all metadata as a single JSON string.\n",
"metadata_as_json_column = ColumnConfig(\n",
" column_family=DATA_COLUMN_FAMILY, column_qualifier=\"metadata_json\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "cb1e1581032b452c9409d6c6813c49d1",
"metadata": {
"id": "cb1e1581032b452c9409d6c6813c49d1"
},
"source": [
"### 4. Create the BigtableVectorStore Instance"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "iKM4BktZR56p",
"metadata": {
"id": "iKM4BktZR56p"
},
"outputs": [],
"source": [
"# Configure the columns for your store.\n",
"content_column = ColumnConfig(\n",
" column_family=DATA_COLUMN_FAMILY, column_qualifier=\"content\"\n",
")\n",
"embedding_column = ColumnConfig(\n",
" column_family=DATA_COLUMN_FAMILY, column_qualifier=\"embedding\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "379cbbc1e968416e875cc15c1202d7eb",
"metadata": {
"id": "379cbbc1e968416e875cc15c1202d7eb"
},
"outputs": [],
"source": [
"from langchain_google_bigtable import BigtableVectorStore\n",
"\n",
"vector_store = await BigtableVectorStore.create(\n",
" project_id=PROJECT_ID,\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
" engine=engine,\n",
" embedding_service=embeddings,\n",
" collection=collection_name,\n",
" metadata_mappings=metadata_mappings,\n",
" metadata_as_json_column=metadata_as_json_column,\n",
" content_column=content_column,\n",
" embedding_column=embedding_column,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "277c27b1587741f2af2001be3712ef0d",
"metadata": {
"id": "277c27b1587741f2af2001be3712ef0d"
},
"source": [
"## Manage vector store"
]
},
{
"cell_type": "markdown",
"id": "db7b79bc585a40fcaf58bf750017e135",
"metadata": {
"id": "db7b79bc585a40fcaf58bf750017e135"
},
"source": [
"### Add Documents\n",
"You can add documents with pre-defined IDs. If a `Document` is added without an `id` attribute, the vector store will automatically generate a **`uuid4` string** for it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "916684f9a58a4a2aa5f864670399430d",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "916684f9a58a4a2aa5f864670399430d",
"outputId": "eb343088-624a-41a1-94cd-53e0c3cfa207"
},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"\n",
"docs_to_add = [\n",
" Document(\n",
" page_content=\"A young farm boy, Luke Skywalker, is thrust into a galactic conflict.\",\n",
" id=\"doc_1\",\n",
" metadata={\n",
" \"author\": \"George Lucas\",\n",
" \"year\": 1977,\n",
" \"category\": \"sci-fi\",\n",
" \"rating\": 4.8,\n",
" },\n",
" ),\n",
" Document(\n",
" page_content=\"A hobbit named Frodo Baggins must destroy a powerful ring.\",\n",
" id=\"doc_2\",\n",
" metadata={\n",
" \"author\": \"J.R.R. Tolkien\",\n",
" \"year\": 1954,\n",
" \"category\": \"fantasy\",\n",
" \"rating\": 4.9,\n",
" },\n",
" ),\n",
" # Document without a pre-defined ID, one will be generated.\n",
" Document(\n",
" page_content=\"A group of children confront an evil entity emerging from the sewers.\",\n",
" metadata={\"author\": \"Stephen King\", \"year\": 1986, \"category\": \"horror\"},\n",
" ),\n",
" Document(\n",
" page_content=\"In a distant future, the noble House Atreides rules the desert planet Arrakis.\",\n",
" id=\"doc_3\",\n",
" metadata={\n",
" \"author\": \"Frank Herbert\",\n",
" \"year\": 1965,\n",
" \"category\": \"sci-fi\",\n",
" \"rating\": 4.9,\n",
" },\n",
" ),\n",
"]\n",
"\n",
"added_ids = await vector_store.aadd_documents(docs_to_add)\n",
"print(f\"Added documents with IDs: {added_ids}\")"
]
},
{
"cell_type": "markdown",
"id": "1671c31a24314836a5b85d7ef7fbf015",
"metadata": {
"id": "1671c31a24314836a5b85d7ef7fbf015"
},
"source": [
"### Update Documents\n",
"`BigtableVectorStore` handles updates by overwriting. To update a document, simply add it again with the same ID but with new content or metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33b0902fd34d4ace834912fa1002cf8e",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "33b0902fd34d4ace834912fa1002cf8e",
"outputId": "d80f2b01-44df-45d7-9ff5-f77527f04733"
},
"outputs": [],
"source": [
"doc_to_update = [\n",
" Document(\n",
" page_content=\"An old hobbit, Frodo Baggins, must take a powerful ring to be destroyed.\", # Updated content\n",
" id=\"doc_2\", # Same ID\n",
" metadata={\n",
" \"author\": \"J.R.R. Tolkien\",\n",
" \"year\": 1954,\n",
" \"category\": \"epic-fantasy\",\n",
" \"rating\": 4.9,\n",
" }, # Updated metadata\n",
" )\n",
"]\n",
"\n",
"await vector_store.aadd_documents(doc_to_update)\n",
"print(\"Document 'doc_2' has been updated.\")"
]
},
{
"cell_type": "markdown",
"id": "f6fa52606d8c4a75a9b52967216f8f3f",
"metadata": {
"id": "f6fa52606d8c4a75a9b52967216f8f3f"
},
"source": [
"### Delete Documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5a1fa73e5044315a093ec459c9be902",
"metadata": {
"id": "f5a1fa73e5044315a093ec459c9be902"
},
"outputs": [],
"source": [
"is_deleted = await vector_store.adelete(ids=[\"doc_2\"])"
]
},
{
"cell_type": "markdown",
"id": "cdf66aed5cc84ca1b48e60bad68798a8",
"metadata": {
"id": "cdf66aed5cc84ca1b48e60bad68798a8"
},
"source": [
"## Query vector store"
]
},
{
"cell_type": "markdown",
"id": "28d3efd5258a48a79c179ea5c6759f01",
"metadata": {
"id": "28d3efd5258a48a79c179ea5c6759f01"
},
"source": [
"### Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f9bc0b9dd2c44919cc8dcca39b469f8",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3f9bc0b9dd2c44919cc8dcca39b469f8",
"outputId": "dbd5426c-139a-451b-d456-c241cf794aec"
},
"outputs": [],
"source": [
"results = await vector_store.asimilarity_search(\"a story about a powerful ring\", k=1)\n",
"print(results[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "0e382214b5f147d187d36a2058b9c724",
"metadata": {
"id": "0e382214b5f147d187d36a2058b9c724"
},
"source": [
"### Search with Filters\n",
"\n",
"Apply filters before the vector search runs."
]
},
{
"cell_type": "markdown",
"id": "e7f8g9h0-query-header-restored",
"metadata": {
"id": "e7f8g9h0-query-header-restored"
},
"source": [
"#### The kNN Search Algorithm and Filtering\n",
"\n",
"By default, `BigtableVectorStore` uses a **k-Nearest Neighbors (kNN)** search algorithm to find the `k` vectors in the database that are most similar to your query vector. The vector store offers filtering to reduce the search space *before* the kNN search is performed, which can make queries faster and more relevant.\n",
"\n",
"#### Configuring Queries with `QueryParameters`\n",
"\n",
"All search settings are controlled via the `QueryParameters` object. This object allows you to specify not only filters but also other important search aspects:\n",
"* `algorithm`: The search algorithm to use. Defaults to `\"kNN\"`.\n",
"* `distance_strategy`: The metric used for comparison, such as `COSINE` (default) or `EUCLIDEAN`.\n",
"* `vector_data_type`: The data type of the stored vectors, like `FLOAT32` or `DOUBLE64`. This should match the precision of your embeddings.\n",
"* `filters`: A dictionary defining the filtering logic to apply.\n",
"\n",
"#### Understanding Encodings\n",
"\n",
"To filter on metadata fields, you must define them in `metadata_mappings` with the correct `encoding` so Bigtable can properly interpret the data. Supported encodings include:\n",
"* **String**: `UTF8`, `UTF16`, `ASCII` for text-based metadata.\n",
"* **Numeric**: `INT_BIG_ENDIAN` or `INT_LITTLE_ENDIAN` for integers, and `FLOAT` or `DOUBLE` for decimal numbers.\n",
"* **Boolean**: `BOOL` for true/false values."
]
},
{
"cell_type": "markdown",
"id": "5b09d5ef5b5e4bb6ab9b829b10b6a29f",
"metadata": {
"id": "5b09d5ef5b5e4bb6ab9b829b10b6a29f"
},
"source": [
"#### Filtering Support Table\n",
"\n",
"| Filter Category | Key / Operator | Meaning |\n",
"|---|---|---|\n",
"| **Row Key** | `RowKeyFilter` | Narrows search to document IDs with a specific prefix. |\n",
"| **Metadata Key** | `ColumnQualifiers` | Checks for the presence of one or more exact metadata keys. |\n",
"| | `ColumnQualifierPrefix` | Checks if a metadata key starts with a given prefix. |\n",
"| | `ColumnQualifierRegex` | Checks if a metadata key matches a regular expression. |\n",
"| **Metadata Value** | `ColumnValueFilter` | Container for all value-based conditions. |\n",
"| | `==` | Equality |\n",
"| | `!=` | Inequality |\n",
"| | `>` | Greater than |\n",
"| | `<` | Less than |\n",
"| | `>=` | Greater than or equal |\n",
"| | `<=` | Less than or equal |\n",
"| | `in` | Value is in a list. |\n",
"| | `nin` | Value is not in a list. |\n",
"| | `contains` | Checks for substring presence. |\n",
"| | `like` | Performs a regex match on a string. |\n",
"| **Logical**| `ColumnValueChainFilter` | Logical AND for combining value conditions. |\n",
"| | `ColumnValueUnionFilter` | Logical OR for combining value conditions. |"
]
},
{
"cell_type": "markdown",
"id": "a50416e276a0479cbe66534ed1713a40",
"metadata": {
"id": "a50416e276a0479cbe66534ed1713a40"
},
"source": [
"#### Complex Filter Example\n",
"\n",
"This example uses multiple nested logical filters. It searches for documents that are either (`category` is 'sci-fi' AND `year` between 1970-2000) OR (`author` is 'J.R.R. Tolkien') OR (`rating` > 4.5)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46a27a456b804aa2a380d5edf15a5daf",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "46a27a456b804aa2a380d5edf15a5daf",
"outputId": "7679570a-80f6-4342-8380-daecb62d7cf8"
},
"outputs": [],
"source": [
"from langchain_google_bigtable.vector_store import QueryParameters\n",
"\n",
"complex_filter = {\n",
" \"ColumnValueFilter\": {\n",
" \"ColumnValueUnionFilter\": { # OR\n",
" \"ColumnValueChainFilter\": { # First AND condition\n",
" \"category\": {\"==\": \"sci-fi\"},\n",
" \"year\": {\">\": 1970, \"<\": 2000},\n",
" },\n",
" \"author\": {\"==\": \"J.R.R. Tolkien\"},\n",
" }\n",
" }\n",
"}\n",
"\n",
"query_params_complex = QueryParameters(filters=complex_filter)\n",
"\n",
"complex_results = await vector_store.asimilarity_search(\n",
" \"a story about a hero's journey\", k=5, query_parameters=query_params_complex\n",
")\n",
"\n",
"print(f\"Found {len(complex_results)} documents matching the complex filter:\")\n",
"for doc in complex_results:\n",
" print(f\"- ID: {doc.id}, Metadata: {doc.metadata}\")"
]
},
{
"cell_type": "markdown",
"id": "1944c39560714e6e80c856f20744a8e5",
"metadata": {
"id": "1944c39560714e6e80c856f20744a8e5"
},
"source": [
"### Search with score\n",
"You can also retrieve the distance score along with the documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d6ca27006b894b04b6fc8b79396e2797",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "d6ca27006b894b04b6fc8b79396e2797",
"outputId": "32360bd3-7ccb-4ed6-b68a-52788c902049"
},
"outputs": [],
"source": [
"results_with_scores = await vector_store.asimilarity_search_with_score(\n",
" query=\"an evil entity\", k=1\n",
")\n",
"for doc, score in results_with_scores:\n",
" print(f\"* [SCORE={score:.4f}] {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "f61877af4e7f4313ad8234302950b331",
"metadata": {
"id": "f61877af4e7f4313ad8234302950b331"
},
"source": [
"### Use as Retriever\n",
"The vector store can be easily used as a retriever in RAG applications. You can specify the search type (e.g., `similarity` or `mmr`) and pass search-time arguments like `k` and `query_parameters`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84d5ab97d17b4c38ab41a2b065bbd0c0",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "84d5ab97d17b4c38ab41a2b065bbd0c0",
"outputId": "b33dc07f-08d4-4108-c50d-96dd4e8d719b"
},
"outputs": [],
"source": [
"# Define a filter to use with the retriever\n",
"retriever_filter = {\"ColumnValueFilter\": {\"category\": {\"==\": \"horror\"}}}\n",
"retriever_query_params = QueryParameters(filters=retriever_filter)\n",
"\n",
"retriever = vector_store.as_retriever(\n",
" search_type=\"mmr\", # Specify MMR for retrieval\n",
" search_kwargs={\n",
" \"k\": 1,\n",
" \"lambda_mult\": 0.8,\n",
" \"query_parameters\": retriever_query_params, # Pass filter parameters\n",
" },\n",
")\n",
"retrieved_docs = await retriever.ainvoke(\"a story about a hobbit\")\n",
"print(retrieved_docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "35ffc1ce1c7b4df9ace1bc936b8b1dc2",
"metadata": {
"id": "35ffc1ce1c7b4df9ace1bc936b8b1dc2"
},
"source": [
"## Usage for retrieval-augmented generation\n",
"\n",
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
"\n",
"- [Tutorials](https://python.langchain.com/docs/tutorials/rag/)\n",
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)"
]
},
{
"cell_type": "markdown",
"id": "76127f4a2f6a44fba749ea7800e59d51",
"metadata": {
"id": "76127f4a2f6a44fba749ea7800e59d51"
},
"source": [
"## API reference\n",
"\n",
"For full details on the `BigtableVectorStore` class, see the source code on [GitHub](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/vector_store.py)."
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}