mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-24 12:01:54 +00:00
docs: add Bigtable Key-value Store and Vector Store Docs (#32598)
Thank you for contributing to LangChain! Follow these steps to mark your pull request as ready for review. **If any of these steps are not completed, your PR will not be considered for review.** - [x] **feat(docs)**: add Bigtable Key-value store doc - [X] **feat(docs)**: add Bigtable Vector store doc This PR adds a doc for Bigtable and LangChain Key-value store integration. It contains guides on how to add, delete, get, and yield key-value pairs from Bigtable Key-value Store for LangChain. - [x] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. **We will not consider a PR unless these three are passing in CI.** See [contribution guidelines](https://python.langchain.com/docs/contributing/) for more. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to `pyproject.toml` files (even optional ones) unless they are **required** for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>
This commit is contained in:
129
docs/docs/integrations/providers/google-bigtable.mdx
Normal file
129
docs/docs/integrations/providers/google-bigtable.mdx
Normal file
@@ -0,0 +1,129 @@
|
||||
# Bigtable
|
||||
|
||||
Bigtable is a scalable, fully managed key-value and wide-column store ideal for fast access to structured, semi-structured, or unstructured data. This page provides an overview of Bigtable's LangChain integrations.
|
||||
|
||||
**Client Library Documentation:** [cloud.google.com/python/docs/reference/langchain-google-bigtable/latest](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest)
|
||||
|
||||
**Product Documentation:** [cloud.google.com/bigtable](https://cloud.google.com/bigtable)
|
||||
|
||||
## Quick Start
|
||||
|
||||
To use this library, you first need to:
|
||||
|
||||
1. Select or create a Cloud Platform project.
|
||||
2. Enable billing for your project.
|
||||
3. Enable the Google Cloud Bigtable API.
|
||||
4. Set up Authentication.
|
||||
|
||||
## Installation
|
||||
|
||||
The main package for this integration is `langchain-google-bigtable`.
|
||||
|
||||
```bash
|
||||
pip install -U langchain-google-bigtable
|
||||
```
|
||||
|
||||
## Integrations
|
||||
|
||||
The `langchain-google-bigtable` package provides the following integrations:
|
||||
|
||||
### Vector Store
|
||||
|
||||
With `BigtableVectorStore`, you can store documents and their vector embeddings to find the most similar or relevant information in your database.
|
||||
|
||||
* **Full `VectorStore` Implementation:** Supports all methods from the LangChain `VectorStore` abstract class.
|
||||
* **Async/Sync Support:** All methods are available in both asynchronous and synchronous versions.
|
||||
* **Metadata Filtering:** Powerful filtering on metadata fields, including logical AND/OR combinations.
|
||||
* **Multiple Distance Strategies:** Supports both Cosine and Euclidean distance for similarity search.
|
||||
* **Customizable Storage:** Full control over how content, embeddings, and metadata are stored in Bigtable columns.
|
||||
|
||||
```python
|
||||
from langchain_google_bigtable import BigtableVectorStore
|
||||
|
||||
# Your embedding service and other configurations
|
||||
# embedding_service = ...
|
||||
|
||||
engine = await BigtableEngine.async_initialize(project_id="your-project-id")
|
||||
vector_store = await BigtableVectorStore.create(
|
||||
engine=engine,
|
||||
instance_id="your-instance-id",
|
||||
table_id="your-table-id",
|
||||
embedding_service=embedding_service,
|
||||
collection="your_collection_name",
|
||||
)
|
||||
await vector_store.aadd_documents([your_documents])
|
||||
results = await vector_store.asimilarity_search("your query")
|
||||
```
|
||||
|
||||
|
||||
Learn more in the [Vector Store how-to guide](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/vector_store.ipynb).
|
||||
|
||||
### Key-value Store
|
||||
|
||||
Use `BigtableByteStore` as a persistent, scalable key-value store for caching, session management, or other storage needs. It supports both synchronous and asynchronous operations.
|
||||
|
||||
```python
|
||||
from langchain_google_bigtable import BigtableByteStore
|
||||
|
||||
# Initialize the store
|
||||
store = await BigtableByteStore.create(
|
||||
project_id="your-project-id",
|
||||
instance_id="your-instance-id",
|
||||
table_id="your-table-id",
|
||||
)
|
||||
|
||||
# Set and get values
|
||||
await store.amset([("key1", b"value1")])
|
||||
retrieved = await store.amget(["key1"])
|
||||
```
|
||||
|
||||
Learn more in the [Key-value Store how-to guide](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest/key-value-store).
|
||||
|
||||
### Document Loader
|
||||
|
||||
Use the `BigtableLoader` to load data from a Bigtable table and represent it as LangChain `Document` objects.
|
||||
|
||||
```python
|
||||
from langchain_google_bigtable import BigtableLoader
|
||||
|
||||
loader = BigtableLoader(
|
||||
project_id="your-project-id",
|
||||
instance_id="your-instance-id",
|
||||
table_id="your-table-name"
|
||||
)
|
||||
docs = loader.load()
|
||||
```
|
||||
|
||||
Learn more in the [Document Loader how-to guide](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest/document-loader).
|
||||
|
||||
### Chat Message History
|
||||
|
||||
Use `BigtableChatMessageHistory` to store conversation histories, enabling stateful chains and agents.
|
||||
|
||||
```python
|
||||
from langchain_google_bigtable import BigtableChatMessageHistory
|
||||
|
||||
history = BigtableChatMessageHistory(
|
||||
project_id="your-project-id",
|
||||
instance_id="your-instance-id",
|
||||
table_id="your-message-store",
|
||||
session_id="user-session-123"
|
||||
)
|
||||
|
||||
history.add_user_message("Hello!")
|
||||
history.add_ai_message("Hi there!")
|
||||
```
|
||||
|
||||
Learn more in the [Chat Message History how-to guide](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest/chat-message-history).
|
||||
|
||||
## Contributions
|
||||
|
||||
Contributions to this library are welcome. Please see the CONTRIBUTING guide in the [package repo](https://github.com/googleapis/langchain-google-bigtable-python/) for more details
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the Apache 2.0 License - see the LICENSE file in the [package repo](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/LICENSE) for details.
|
||||
|
||||
## Disclaimer
|
||||
|
||||
This is not an officially supported Google product.
|
483
docs/docs/integrations/stores/bigtable.ipynb
Normal file
483
docs/docs/integrations/stores/bigtable.ipynb
Normal file
@@ -0,0 +1,483 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_label: Google Bigtable\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# BigtableByteStore\n",
|
||||
"\n",
|
||||
"This guide covers how to use Google Cloud Bigtable as a key-value store.\n",
|
||||
"\n",
|
||||
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. \n",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/key_value_store.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"The `BigtableByteStore` uses Google Cloud Bigtable as a backend for a key-value store. It supports synchronous and asynchronous operations for setting, getting, and deleting key-value pairs.\n",
|
||||
"\n",
|
||||
"### Integration details\n",
|
||||
"| Class | Package | Local | JS support | Package downloads | Package latest |\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: | :---: |\n",
|
||||
"| [BigtableByteStore](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py) | [langchain-google-bigtable](https://pypi.org/project/langchain-google-bigtable/) | ❌ | ❌ |  |  |"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"### Prerequisites\n",
|
||||
"\n",
|
||||
"To get started, you will need a Google Cloud project with an active Bigtable instance and table. \n",
|
||||
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
|
||||
"* [Enable the Bigtable API](https://console.cloud.google.com/flows/enableapi?apiid=bigtable.googleapis.com)\n",
|
||||
"* [Create a Bigtable instance and table](https://cloud.google.com/bigtable/docs/creating-instance)\n",
|
||||
"\n",
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"The integration is in the `langchain-google-bigtable` package. The command below also installs `langchain-google-vertexai` for the embedding cache example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-google-bigtable langchain-google-vertexai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### ☁ Set Your Google Cloud Project\n",
|
||||
"Set your Google Cloud project to use its resources within this notebook.\n",
|
||||
"\n",
|
||||
"If you don't know your project ID, you can run `gcloud config list` or see the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# @markdown Please fill in your project, instance, and table details.\n",
|
||||
"PROJECT_ID = \"your-gcp-project-id\" # @param {type:\"string\"}\n",
|
||||
"INSTANCE_ID = \"your-instance-id\" # @param {type:\"string\"}\n",
|
||||
"TABLE_ID = \"your-table-id\" # @param {type:\"string\"}\n",
|
||||
"\n",
|
||||
"!gcloud config set project {PROJECT_ID}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 🔐 Authentication\n",
|
||||
"Authenticate to Google Cloud to access your project resources.\n",
|
||||
"- For **Colab**, use the cell below.\n",
|
||||
"- For **Vertex AI Workbench**, see the [setup instructions](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from google.colab import auth\n",
|
||||
"\n",
|
||||
"auth.authenticate_user()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"To use `BigtableByteStore`, we first ensure a table exists and then initialize a `BigtableEngine` to manage connections."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_bigtable import (\n",
|
||||
" BigtableByteStore,\n",
|
||||
" BigtableEngine,\n",
|
||||
" init_key_value_store_table,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Ensure the table and column family exist.\n",
|
||||
"init_key_value_store_table(\n",
|
||||
" project_id=PROJECT_ID,\n",
|
||||
" instance_id=INSTANCE_ID,\n",
|
||||
" table_id=TABLE_ID,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### BigtableEngine\n",
|
||||
"A `BigtableEngine` object handles the execution context for the store, especially for async operations. It's recommended to initialize a single engine and reuse it across multiple stores for better performance."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize the engine to manage async operations.\n",
|
||||
"engine = await BigtableEngine.async_initialize(\n",
|
||||
" project_id=PROJECT_ID, instance_id=INSTANCE_ID\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### BigtableByteStore\n",
|
||||
"\n",
|
||||
"This is the main class for interacting with the key-value store. It provides the methods for setting, getting, and deleting data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize the store.\n",
|
||||
"store = await BigtableByteStore.create(engine=engine, table_id=TABLE_ID)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage\n",
|
||||
"\n",
|
||||
"The store supports both sync (`mset`, `mget`) and async (`amset`, `amget`) methods. This guide uses the async versions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Set\n",
|
||||
"Use `amset` to save key-value pairs to the store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"kv_pairs = [\n",
|
||||
" (\"key1\", b\"value1\"),\n",
|
||||
" (\"key2\", b\"value2\"),\n",
|
||||
" (\"key3\", b\"value3\"),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"await store.amset(kv_pairs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Get\n",
|
||||
"Use `amget` to retrieve values. If a key is not found, `None` is returned for that key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retrieved_vals = await store.amget([\"key1\", \"key2\", \"nonexistent_key\"])\n",
|
||||
"print(retrieved_vals)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete\n",
|
||||
"Use `amdelete` to remove keys from the store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"await store.amdelete([\"key3\"])\n",
|
||||
"\n",
|
||||
"# Verifying the key was deleted\n",
|
||||
"await store.amget([\"key1\", \"key3\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Iterate over keys\n",
|
||||
"Use `ayield_keys` to iterate over all keys or keys with a specific prefix."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"all_keys = [key async for key in store.ayield_keys()]\n",
|
||||
"print(f\"All keys: {all_keys}\")\n",
|
||||
"\n",
|
||||
"prefixed_keys = [key async for key in store.ayield_keys(prefix=\"key1\")]\n",
|
||||
"print(f\"Prefixed keys: {prefixed_keys}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Advanced Usage: Embedding Caching\n",
|
||||
"\n",
|
||||
"A common use case for a key-value store is to cache expensive operations like computing text embeddings, which saves time and cost."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import CacheBackedEmbeddings\n",
|
||||
"from langchain_google_vertexai.embeddings import VertexAIEmbeddings\n",
|
||||
"\n",
|
||||
"underlying_embeddings = VertexAIEmbeddings(\n",
|
||||
" project=PROJECT_ID, model_name=\"textembedding-gecko@003\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Use a namespace to avoid key collisions with other data.\n",
|
||||
"cached_embedder = CacheBackedEmbeddings.from_bytes_store(\n",
|
||||
" underlying_embeddings, store, namespace=\"text-embeddings\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"First call (computes and caches embedding):\")\n",
|
||||
"%time embedding_result_1 = await cached_embedder.aembed_query(\"Hello, world!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"\\nSecond call (retrieves from cache):\")\n",
|
||||
"%time embedding_result_2 = await cached_embedder.aembed_query(\"Hello, world!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### As a Simple Document Retriever\n",
|
||||
"\n",
|
||||
"This section shows how to create a simple retriever using the Bigtable store. It acts as a document persistence layer, fetching documents that match a query prefix."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.retrievers import BaseRetriever\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_core.callbacks import CallbackManagerForRetrieverRun\n",
|
||||
"from typing import List, Optional, Any, Union\n",
|
||||
"import json\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class SimpleKVStoreRetriever(BaseRetriever):\n",
|
||||
" \"\"\"A simple retriever that retrieves documents based on a prefix match in the key-value store.\"\"\"\n",
|
||||
"\n",
|
||||
" store: BigtableByteStore\n",
|
||||
" documents: List[Union[Document, str]]\n",
|
||||
" k: int\n",
|
||||
"\n",
|
||||
" def set_up_store(self):\n",
|
||||
" kv_pairs_to_set = []\n",
|
||||
" for i, doc in enumerate(self.documents):\n",
|
||||
" if isinstance(doc, str):\n",
|
||||
" doc = Document(page_content=doc)\n",
|
||||
" if not doc.id:\n",
|
||||
" doc.id = str(i)\n",
|
||||
" value = (\n",
|
||||
" \"Page Content\\n\"\n",
|
||||
" + doc.page_content\n",
|
||||
" + \"\\nMetadata\"\n",
|
||||
" + json.dumps(doc.metadata)\n",
|
||||
" )\n",
|
||||
" kv_pairs_to_set.append((doc.id, value.encode(\"utf-8\")))\n",
|
||||
" self.store.mset(kv_pairs_to_set)\n",
|
||||
"\n",
|
||||
" async def _aget_relevant_documents(\n",
|
||||
" self,\n",
|
||||
" query: str,\n",
|
||||
" *,\n",
|
||||
" run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
|
||||
" ) -> List[Document]:\n",
|
||||
" keys = [key async for key in self.store.ayield_keys(prefix=query)][: self.k]\n",
|
||||
" documents_retrieved = []\n",
|
||||
" async for document in await self.store.amget(keys):\n",
|
||||
" if document:\n",
|
||||
" document_str = document.decode(\"utf-8\")\n",
|
||||
" page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
|
||||
" metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
|
||||
" documents_retrieved.append(\n",
|
||||
" Document(page_content=page_content, metadata=metadata)\n",
|
||||
" )\n",
|
||||
" return documents_retrieved\n",
|
||||
"\n",
|
||||
" def _get_relevant_documents(\n",
|
||||
" self,\n",
|
||||
" query: str,\n",
|
||||
" *,\n",
|
||||
" run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
|
||||
" ) -> list[Document]:\n",
|
||||
" keys = [key for key in self.store.yield_keys(prefix=query)][: self.k]\n",
|
||||
" documents_retrieved = []\n",
|
||||
" for document in self.store.mget(keys):\n",
|
||||
" if document:\n",
|
||||
" document_str = document.decode(\"utf-8\")\n",
|
||||
" page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
|
||||
" metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
|
||||
" documents_retrieved.append(\n",
|
||||
" Document(page_content=page_content, metadata=metadata)\n",
|
||||
" )\n",
|
||||
" return documents_retrieved"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = [\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Goldfish are popular pets for beginners, requiring relatively simple care.\",\n",
|
||||
" metadata={\"type\": \"fish\", \"trait\": \"low maintenance\"},\n",
|
||||
" id=\"fish#Goldfish\",\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Cats are independent pets that often enjoy their own space.\",\n",
|
||||
" metadata={\"type\": \"cat\", \"trait\": \"independence\"},\n",
|
||||
" id=\"mammals#Cats\",\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Rabbits are social animals that need plenty of space to hop around.\",\n",
|
||||
" metadata={\"type\": \"rabbit\", \"trait\": \"social\"},\n",
|
||||
" id=\"mammals#Rabbits\",\n",
|
||||
" ),\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever_store = BigtableByteStore.create_sync(\n",
|
||||
" engine=engine, instance_id=INSTANCE_ID, table_id=TABLE_ID\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"KVDocumentRetriever = SimpleKVStoreRetriever(\n",
|
||||
" store=retriever_store, documents=documents, k=2\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"KVDocumentRetriever.set_up_store()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"KVDocumentRetriever.invoke(\"fish\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"KVDocumentRetriever.invoke(\"mammals\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For full details on the `BigtableByteStore` class, see the source code on [GitHub](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
839
docs/docs/integrations/vectorstores/bigtable.ipynb
Normal file
839
docs/docs/integrations/vectorstores/bigtable.ipynb
Normal file
@@ -0,0 +1,839 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "7fb27b941602401d91542211134fc71a",
|
||||
"metadata": {
|
||||
"id": "7fb27b941602401d91542211134fc71a"
|
||||
},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_label: Google Bigtable\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "acae54e37e7d407bbb7b55eff062a284",
|
||||
"metadata": {
|
||||
"id": "acae54e37e7d407bbb7b55eff062a284"
|
||||
},
|
||||
"source": [
|
||||
"# BigtableVectorStore\n",
|
||||
"\n",
|
||||
"This guide covers the `BigtableVectorStore` integration for using Google Cloud Bigtable as a vector store.\n",
|
||||
"\n",
|
||||
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9a63283cbaf04dbcab1f6479b197f3a8",
|
||||
"metadata": {
|
||||
"id": "9a63283cbaf04dbcab1f6479b197f3a8"
|
||||
},
|
||||
"source": [
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"The `BigtableVectorStore` uses Google Cloud Bigtable to store documents and their vector embeddings for similarity search and retrieval. It supports powerful metadata filtering to refine search results.\n",
|
||||
"\n",
|
||||
"### Integration details\n",
|
||||
"| Class | Package | Local | JS support | Package downloads | Package latest |\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: | :---: |\n",
|
||||
"| [BigtableVectorStore](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/vector_store.py) | [langchain-google-bigtable](https://pypi.org/project/langchain-google-bigtable/) | ❌ | ❌ |  |  |"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8dd0d8092fe74a7c96281538738b07e2",
|
||||
"metadata": {
|
||||
"id": "8dd0d8092fe74a7c96281538738b07e2"
|
||||
},
|
||||
"source": [
|
||||
"## Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "72eea5119410473aa328ad9291626812",
|
||||
"metadata": {
|
||||
"id": "72eea5119410473aa328ad9291626812"
|
||||
},
|
||||
"source": [
|
||||
"### Prerequisites\n",
|
||||
"\n",
|
||||
"To get started, you will need a Google Cloud project with an active Bigtable instance.\n",
|
||||
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
|
||||
"* [Enable the Bigtable API](https://console.cloud.google.com/flows/enableapi?apiid=bigtable.googleapis.com)\n",
|
||||
"* [Create a Bigtable instance](https://cloud.google.com/bigtable/docs/creating-instance)\n",
|
||||
"\n",
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"The integration is in the `langchain-google-bigtable` package. The command below also installs `langchain-google-vertexai` to use for an embedding service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8edb47106e1a46a883d545849b8ab81b",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "8edb47106e1a46a883d545849b8ab81b",
|
||||
"outputId": "b6c95f84-f271-4bd0-f024-81ea38ce7f80"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-google-bigtable langchain-google-vertexai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "WEparXIIO41L",
|
||||
"metadata": {
|
||||
"id": "WEparXIIO41L"
|
||||
},
|
||||
"source": [
|
||||
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "OB8Mg8HxO9HV",
|
||||
"metadata": {
|
||||
"id": "OB8Mg8HxO9HV"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Automatically restart kernel after installs so that your environment can access the new packages\n",
|
||||
"# import IPython\n",
|
||||
"\n",
|
||||
"# app = IPython.Application.instance()\n",
|
||||
"# app.kernel.do_shutdown(True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "10185d26023b46108eb7d9f57d49d2b3",
|
||||
"metadata": {
|
||||
"id": "10185d26023b46108eb7d9f57d49d2b3"
|
||||
},
|
||||
"source": [
|
||||
"### Set Your Google Cloud Project\n",
|
||||
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
|
||||
"\n",
|
||||
"If you don't know your project ID, try the following:\n",
|
||||
"\n",
|
||||
"* Run `gcloud config list`.\n",
|
||||
"* Run `gcloud projects list`.\n",
|
||||
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8763a12b2bbd4a93a75aff182afb95dc",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "8763a12b2bbd4a93a75aff182afb95dc",
|
||||
"outputId": "865ca13d-47e1-4458-dfe3-96b0e7a57810"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# @markdown Please fill in your project, instance, and a new table name.\n",
|
||||
"PROJECT_ID = \"google.com:cloud-bigtable-dev\" # @param {type:\"string\"}\n",
|
||||
"INSTANCE_ID = \"anweshadas-test\" # @param {type:\"string\"}\n",
|
||||
"TABLE_ID = \"your-vector-store-table-3\" # @param {type:\"string\"}\n",
|
||||
"\n",
|
||||
"!gcloud config set project {PROJECT_ID}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "xx0JMrbNOfnV",
|
||||
"metadata": {
|
||||
"id": "xx0JMrbNOfnV"
|
||||
},
|
||||
"source": [
|
||||
"### 🔐 Authentication\n",
|
||||
"\n",
|
||||
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
|
||||
"\n",
|
||||
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
|
||||
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "T1pPsDCzOURd",
|
||||
"metadata": {
|
||||
"id": "T1pPsDCzOURd"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from google.colab import auth\n",
|
||||
"\n",
|
||||
"auth.authenticate_user(project_id=PROJECT_ID)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7623eae2785240b9bd12b16a66d81610",
|
||||
"metadata": {
|
||||
"id": "7623eae2785240b9bd12b16a66d81610"
|
||||
},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"Initializing the `BigtableVectorStore` involves three steps: setting up the embedding service, ensuring the Bigtable table is created, and configuring the store's parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7cdc8c89c7104fffa095e18ddfef8986",
|
||||
"metadata": {
|
||||
"id": "7cdc8c89c7104fffa095e18ddfef8986"
|
||||
},
|
||||
"source": [
|
||||
"### 1. Set up Embedding Service\n",
|
||||
"First, we need a model to create the vector embeddings for our documents. We'll use a Vertex AI model for this example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b118ea5561624da68c537baed56e602f",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "b118ea5561624da68c537baed56e602f",
|
||||
"outputId": "99b55b9a-61c7-4dbe-bf1f-dd84ddc434da"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_vertexai import VertexAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = VertexAIEmbeddings(project=PROJECT_ID, model_name=\"gemini-embedding-001\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "938c804e27f84196a10c8828c723f798",
|
||||
"metadata": {
|
||||
"id": "938c804e27f84196a10c8828c723f798"
|
||||
},
|
||||
"source": [
|
||||
"### 2. Initialize a Table\n",
|
||||
"Before creating a `BigtableVectorStore`, a table with the correct column families must exist. The `init_vector_store_table` helper function is the recommended way to create and configure a table. If the table already exists, it will do nothing."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "504fb2a444614c0babb325280ed9130a",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "504fb2a444614c0babb325280ed9130a",
|
||||
"outputId": "2e6453bc-5eed-4a3e-a8d4-59e945485be6"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_bigtable.vector_store import init_vector_store_table\n",
|
||||
"\n",
|
||||
"DATA_COLUMN_FAMILY = \"doc_data\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" init_vector_store_table(\n",
|
||||
" project_id=PROJECT_ID,\n",
|
||||
" instance_id=INSTANCE_ID,\n",
|
||||
" table_id=TABLE_ID,\n",
|
||||
" content_column_family=DATA_COLUMN_FAMILY,\n",
|
||||
" embedding_column_family=DATA_COLUMN_FAMILY,\n",
|
||||
" )\n",
|
||||
" print(f\"Table '{TABLE_ID}' is ready.\")\n",
|
||||
"except ValueError as e:\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "59bbdb311c014d738909a11f9e486628",
|
||||
"metadata": {
|
||||
"id": "59bbdb311c014d738909a11f9e486628"
|
||||
},
|
||||
"source": [
|
||||
"### 3. Configure the Vector Store\n",
|
||||
"Now we define the parameters that control how the vector store connects to Bigtable and how it handles data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b43b363d81ae4b689946ece5c682cd59",
|
||||
"metadata": {
|
||||
"id": "b43b363d81ae4b689946ece5c682cd59"
|
||||
},
|
||||
"source": [
|
||||
"#### The BigtableEngine\n",
|
||||
"A `BigtableEngine` object manages clients and async operations. It is highly recommended to initialize a single engine and reuse it across multiple stores for better performance and resource management."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8a65eabff63a45729fe45fb5ade58bdc",
|
||||
"metadata": {
|
||||
"id": "8a65eabff63a45729fe45fb5ade58bdc"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_bigtable import BigtableEngine\n",
|
||||
"\n",
|
||||
"engine = await BigtableEngine.async_initialize(project_id=PROJECT_ID)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c3933fab20d04ec698c2621248eb3be0",
|
||||
"metadata": {
|
||||
"id": "c3933fab20d04ec698c2621248eb3be0"
|
||||
},
|
||||
"source": [
|
||||
"#### Collections\n",
|
||||
"A `collection` provides a logical namespace for your documents within a single Bigtable table. It is used as a prefix for the row keys, allowing multiple vector stores to coexist in the same table without interfering with each other."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4dd4641cc4064e0191573fe9c69df29b",
|
||||
"metadata": {
|
||||
"id": "4dd4641cc4064e0191573fe9c69df29b"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"collection_name = \"my_docs\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8309879909854d7188b41380fd92a7c3",
|
||||
"metadata": {
|
||||
"id": "8309879909854d7188b41380fd92a7c3"
|
||||
},
|
||||
"source": [
|
||||
"#### Metadata Configuration\n",
|
||||
"When creating a `BigtableVectorStore`, you have two optional parameters for handling metadata:\n",
|
||||
"\n",
|
||||
"* `metadata_mappings`: This is a list of `VectorMetadataMapping` objects. You **must** define a mapping for any metadata key you wish to use for filtering in your search queries. Each mapping specifies the data type (`encoding`) for the metadata field, which is crucial for correct filtering.\n",
|
||||
"* `metadata_as_json_column`: This is an optional `ColumnConfig` that tells the store to save the *entire* metadata dictionary as a single JSON string in a specific column. This is useful for efficiently retrieving all of a document's metadata at once, including fields not defined in `metadata_mappings`. **Note:** Fields stored only in this JSON column cannot be used for filtering."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3ed186c9a28b402fb0bc4494df01f08d",
|
||||
"metadata": {
|
||||
"id": "3ed186c9a28b402fb0bc4494df01f08d"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_bigtable import ColumnConfig, VectorMetadataMapping, Encoding\n",
|
||||
"\n",
|
||||
"# Define mappings for metadata fields you want to filter on.\n",
|
||||
"metadata_mappings = [\n",
|
||||
" VectorMetadataMapping(metadata_key=\"author\", encoding=Encoding.UTF8),\n",
|
||||
" VectorMetadataMapping(metadata_key=\"year\", encoding=Encoding.INT_BIG_ENDIAN),\n",
|
||||
" VectorMetadataMapping(metadata_key=\"category\", encoding=Encoding.UTF8),\n",
|
||||
" VectorMetadataMapping(metadata_key=\"rating\", encoding=Encoding.FLOAT),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# Define the optional column for storing all metadata as a single JSON string.\n",
|
||||
"metadata_as_json_column = ColumnConfig(\n",
|
||||
" column_family=DATA_COLUMN_FAMILY, column_qualifier=\"metadata_json\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cb1e1581032b452c9409d6c6813c49d1",
|
||||
"metadata": {
|
||||
"id": "cb1e1581032b452c9409d6c6813c49d1"
|
||||
},
|
||||
"source": [
|
||||
"### 4. Create the BigtableVectorStore Instance"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "iKM4BktZR56p",
|
||||
"metadata": {
|
||||
"id": "iKM4BktZR56p"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Configure the columns for your store.\n",
|
||||
"content_column = ColumnConfig(\n",
|
||||
" column_family=DATA_COLUMN_FAMILY, column_qualifier=\"content\"\n",
|
||||
")\n",
|
||||
"embedding_column = ColumnConfig(\n",
|
||||
" column_family=DATA_COLUMN_FAMILY, column_qualifier=\"embedding\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "379cbbc1e968416e875cc15c1202d7eb",
|
||||
"metadata": {
|
||||
"id": "379cbbc1e968416e875cc15c1202d7eb"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_bigtable import BigtableVectorStore\n",
|
||||
"\n",
|
||||
"vector_store = await BigtableVectorStore.create(\n",
|
||||
" project_id=PROJECT_ID,\n",
|
||||
" instance_id=INSTANCE_ID,\n",
|
||||
" table_id=TABLE_ID,\n",
|
||||
" engine=engine,\n",
|
||||
" embedding_service=embeddings,\n",
|
||||
" collection=collection_name,\n",
|
||||
" metadata_mappings=metadata_mappings,\n",
|
||||
" metadata_as_json_column=metadata_as_json_column,\n",
|
||||
" content_column=content_column,\n",
|
||||
" embedding_column=embedding_column,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "277c27b1587741f2af2001be3712ef0d",
|
||||
"metadata": {
|
||||
"id": "277c27b1587741f2af2001be3712ef0d"
|
||||
},
|
||||
"source": [
|
||||
"## Manage vector store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "db7b79bc585a40fcaf58bf750017e135",
|
||||
"metadata": {
|
||||
"id": "db7b79bc585a40fcaf58bf750017e135"
|
||||
},
|
||||
"source": [
|
||||
"### Add Documents\n",
|
||||
"You can add documents with pre-defined IDs. If a `Document` is added without an `id` attribute, the vector store will automatically generate a **`uuid4` string** for it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "916684f9a58a4a2aa5f864670399430d",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "916684f9a58a4a2aa5f864670399430d",
|
||||
"outputId": "eb343088-624a-41a1-94cd-53e0c3cfa207"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"docs_to_add = [\n",
|
||||
" Document(\n",
|
||||
" page_content=\"A young farm boy, Luke Skywalker, is thrust into a galactic conflict.\",\n",
|
||||
" id=\"doc_1\",\n",
|
||||
" metadata={\n",
|
||||
" \"author\": \"George Lucas\",\n",
|
||||
" \"year\": 1977,\n",
|
||||
" \"category\": \"sci-fi\",\n",
|
||||
" \"rating\": 4.8,\n",
|
||||
" },\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"A hobbit named Frodo Baggins must destroy a powerful ring.\",\n",
|
||||
" id=\"doc_2\",\n",
|
||||
" metadata={\n",
|
||||
" \"author\": \"J.R.R. Tolkien\",\n",
|
||||
" \"year\": 1954,\n",
|
||||
" \"category\": \"fantasy\",\n",
|
||||
" \"rating\": 4.9,\n",
|
||||
" },\n",
|
||||
" ),\n",
|
||||
" # Document without a pre-defined ID, one will be generated.\n",
|
||||
" Document(\n",
|
||||
" page_content=\"A group of children confront an evil entity emerging from the sewers.\",\n",
|
||||
" metadata={\"author\": \"Stephen King\", \"year\": 1986, \"category\": \"horror\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"In a distant future, the noble House Atreides rules the desert planet Arrakis.\",\n",
|
||||
" id=\"doc_3\",\n",
|
||||
" metadata={\n",
|
||||
" \"author\": \"Frank Herbert\",\n",
|
||||
" \"year\": 1965,\n",
|
||||
" \"category\": \"sci-fi\",\n",
|
||||
" \"rating\": 4.9,\n",
|
||||
" },\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"added_ids = await vector_store.aadd_documents(docs_to_add)\n",
|
||||
"print(f\"Added documents with IDs: {added_ids}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1671c31a24314836a5b85d7ef7fbf015",
|
||||
"metadata": {
|
||||
"id": "1671c31a24314836a5b85d7ef7fbf015"
|
||||
},
|
||||
"source": [
|
||||
"### Update Documents\n",
|
||||
"`BigtableVectorStore` handles updates by overwriting. To update a document, simply add it again with the same ID but with new content or metadata."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "33b0902fd34d4ace834912fa1002cf8e",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "33b0902fd34d4ace834912fa1002cf8e",
|
||||
"outputId": "d80f2b01-44df-45d7-9ff5-f77527f04733"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"doc_to_update = [\n",
|
||||
" Document(\n",
|
||||
" page_content=\"An old hobbit, Frodo Baggins, must take a powerful ring to be destroyed.\", # Updated content\n",
|
||||
" id=\"doc_2\", # Same ID\n",
|
||||
" metadata={\n",
|
||||
" \"author\": \"J.R.R. Tolkien\",\n",
|
||||
" \"year\": 1954,\n",
|
||||
" \"category\": \"epic-fantasy\",\n",
|
||||
" \"rating\": 4.9,\n",
|
||||
" }, # Updated metadata\n",
|
||||
" )\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"await vector_store.aadd_documents(doc_to_update)\n",
|
||||
"print(\"Document 'doc_2' has been updated.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f6fa52606d8c4a75a9b52967216f8f3f",
|
||||
"metadata": {
|
||||
"id": "f6fa52606d8c4a75a9b52967216f8f3f"
|
||||
},
|
||||
"source": [
|
||||
"### Delete Documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f5a1fa73e5044315a093ec459c9be902",
|
||||
"metadata": {
|
||||
"id": "f5a1fa73e5044315a093ec459c9be902"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"is_deleted = await vector_store.adelete(ids=[\"doc_2\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cdf66aed5cc84ca1b48e60bad68798a8",
|
||||
"metadata": {
|
||||
"id": "cdf66aed5cc84ca1b48e60bad68798a8"
|
||||
},
|
||||
"source": [
|
||||
"## Query vector store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "28d3efd5258a48a79c179ea5c6759f01",
|
||||
"metadata": {
|
||||
"id": "28d3efd5258a48a79c179ea5c6759f01"
|
||||
},
|
||||
"source": [
|
||||
"### Search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3f9bc0b9dd2c44919cc8dcca39b469f8",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "3f9bc0b9dd2c44919cc8dcca39b469f8",
|
||||
"outputId": "dbd5426c-139a-451b-d456-c241cf794aec"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = await vector_store.asimilarity_search(\"a story about a powerful ring\", k=1)\n",
|
||||
"print(results[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0e382214b5f147d187d36a2058b9c724",
|
||||
"metadata": {
|
||||
"id": "0e382214b5f147d187d36a2058b9c724"
|
||||
},
|
||||
"source": [
|
||||
"### Search with Filters\n",
|
||||
"\n",
|
||||
"Apply filters before the vector search runs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e7f8g9h0-query-header-restored",
|
||||
"metadata": {
|
||||
"id": "e7f8g9h0-query-header-restored"
|
||||
},
|
||||
"source": [
|
||||
"#### The kNN Search Algorithm and Filtering\n",
|
||||
"\n",
|
||||
"By default, `BigtableVectorStore` uses a **k-Nearest Neighbors (kNN)** search algorithm to find the `k` vectors in the database that are most similar to your query vector. The vector store offers filtering to reduce the search space *before* the kNN search is performed, which can make queries faster and more relevant.\n",
|
||||
"\n",
|
||||
"#### Configuring Queries with `QueryParameters`\n",
|
||||
"\n",
|
||||
"All search settings are controlled via the `QueryParameters` object. This object allows you to specify not only filters but also other important search aspects:\n",
|
||||
"* `algorithm`: The search algorithm to use. Defaults to `\"kNN\"`.\n",
|
||||
"* `distance_strategy`: The metric used for comparison, such as `COSINE` (default) or `EUCLIDEAN`.\n",
|
||||
"* `vector_data_type`: The data type of the stored vectors, like `FLOAT32` or `DOUBLE64`. This should match the precision of your embeddings.\n",
|
||||
"* `filters`: A dictionary defining the filtering logic to apply.\n",
|
||||
"\n",
|
||||
"#### Understanding Encodings\n",
|
||||
"\n",
|
||||
"To filter on metadata fields, you must define them in `metadata_mappings` with the correct `encoding` so Bigtable can properly interpret the data. Supported encodings include:\n",
|
||||
"* **String**: `UTF8`, `UTF16`, `ASCII` for text-based metadata.\n",
|
||||
"* **Numeric**: `INT_BIG_ENDIAN` or `INT_LITTLE_ENDIAN` for integers, and `FLOAT` or `DOUBLE` for decimal numbers.\n",
|
||||
"* **Boolean**: `BOOL` for true/false values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5b09d5ef5b5e4bb6ab9b829b10b6a29f",
|
||||
"metadata": {
|
||||
"id": "5b09d5ef5b5e4bb6ab9b829b10b6a29f"
|
||||
},
|
||||
"source": [
|
||||
"#### Filtering Support Table\n",
|
||||
"\n",
|
||||
"| Filter Category | Key / Operator | Meaning |\n",
|
||||
"|---|---|---|\n",
|
||||
"| **Row Key** | `RowKeyFilter` | Narrows search to document IDs with a specific prefix. |\n",
|
||||
"| **Metadata Key** | `ColumnQualifiers` | Checks for the presence of one or more exact metadata keys. |\n",
|
||||
"| | `ColumnQualifierPrefix` | Checks if a metadata key starts with a given prefix. |\n",
|
||||
"| | `ColumnQualifierRegex` | Checks if a metadata key matches a regular expression. |\n",
|
||||
"| **Metadata Value** | `ColumnValueFilter` | Container for all value-based conditions. |\n",
|
||||
"| | `==` | Equality |\n",
|
||||
"| | `!=` | Inequality |\n",
|
||||
"| | `>` | Greater than |\n",
|
||||
"| | `<` | Less than |\n",
|
||||
"| | `>=` | Greater than or equal |\n",
|
||||
"| | `<=` | Less than or equal |\n",
|
||||
"| | `in` | Value is in a list. |\n",
|
||||
"| | `nin` | Value is not in a list. |\n",
|
||||
"| | `contains` | Checks for substring presence. |\n",
|
||||
"| | `like` | Performs a regex match on a string. |\n",
|
||||
"| **Logical**| `ColumnValueChainFilter` | Logical AND for combining value conditions. |\n",
|
||||
"| | `ColumnValueUnionFilter` | Logical OR for combining value conditions. |"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a50416e276a0479cbe66534ed1713a40",
|
||||
"metadata": {
|
||||
"id": "a50416e276a0479cbe66534ed1713a40"
|
||||
},
|
||||
"source": [
|
||||
"#### Complex Filter Example\n",
|
||||
"\n",
|
||||
"This example uses multiple nested logical filters. It searches for documents that are either (`category` is 'sci-fi' AND `year` between 1970-2000) OR (`author` is 'J.R.R. Tolkien') OR (`rating` > 4.5)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "46a27a456b804aa2a380d5edf15a5daf",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "46a27a456b804aa2a380d5edf15a5daf",
|
||||
"outputId": "7679570a-80f6-4342-8380-daecb62d7cf8"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_google_bigtable.vector_store import QueryParameters\n",
|
||||
"\n",
|
||||
"complex_filter = {\n",
|
||||
" \"ColumnValueFilter\": {\n",
|
||||
" \"ColumnValueUnionFilter\": { # OR\n",
|
||||
" \"ColumnValueChainFilter\": { # First AND condition\n",
|
||||
" \"category\": {\"==\": \"sci-fi\"},\n",
|
||||
" \"year\": {\">\": 1970, \"<\": 2000},\n",
|
||||
" },\n",
|
||||
" \"author\": {\"==\": \"J.R.R. Tolkien\"},\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"query_params_complex = QueryParameters(filters=complex_filter)\n",
|
||||
"\n",
|
||||
"complex_results = await vector_store.asimilarity_search(\n",
|
||||
" \"a story about a hero's journey\", k=5, query_parameters=query_params_complex\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(f\"Found {len(complex_results)} documents matching the complex filter:\")\n",
|
||||
"for doc in complex_results:\n",
|
||||
" print(f\"- ID: {doc.id}, Metadata: {doc.metadata}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1944c39560714e6e80c856f20744a8e5",
|
||||
"metadata": {
|
||||
"id": "1944c39560714e6e80c856f20744a8e5"
|
||||
},
|
||||
"source": [
|
||||
"### Search with score\n",
|
||||
"You can also retrieve the distance score along with the documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d6ca27006b894b04b6fc8b79396e2797",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "d6ca27006b894b04b6fc8b79396e2797",
|
||||
"outputId": "32360bd3-7ccb-4ed6-b68a-52788c902049"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results_with_scores = await vector_store.asimilarity_search_with_score(\n",
|
||||
" query=\"an evil entity\", k=1\n",
|
||||
")\n",
|
||||
"for doc, score in results_with_scores:\n",
|
||||
" print(f\"* [SCORE={score:.4f}] {doc.page_content} [{doc.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f61877af4e7f4313ad8234302950b331",
|
||||
"metadata": {
|
||||
"id": "f61877af4e7f4313ad8234302950b331"
|
||||
},
|
||||
"source": [
|
||||
"### Use as Retriever\n",
|
||||
"The vector store can be easily used as a retriever in RAG applications. You can specify the search type (e.g., `similarity` or `mmr`) and pass search-time arguments like `k` and `query_parameters`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "84d5ab97d17b4c38ab41a2b065bbd0c0",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "84d5ab97d17b4c38ab41a2b065bbd0c0",
|
||||
"outputId": "b33dc07f-08d4-4108-c50d-96dd4e8d719b"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define a filter to use with the retriever\n",
|
||||
"retriever_filter = {\"ColumnValueFilter\": {\"category\": {\"==\": \"horror\"}}}\n",
|
||||
"retriever_query_params = QueryParameters(filters=retriever_filter)\n",
|
||||
"\n",
|
||||
"retriever = vector_store.as_retriever(\n",
|
||||
" search_type=\"mmr\", # Specify MMR for retrieval\n",
|
||||
" search_kwargs={\n",
|
||||
" \"k\": 1,\n",
|
||||
" \"lambda_mult\": 0.8,\n",
|
||||
" \"query_parameters\": retriever_query_params, # Pass filter parameters\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"retrieved_docs = await retriever.ainvoke(\"a story about a hobbit\")\n",
|
||||
"print(retrieved_docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35ffc1ce1c7b4df9ace1bc936b8b1dc2",
|
||||
"metadata": {
|
||||
"id": "35ffc1ce1c7b4df9ace1bc936b8b1dc2"
|
||||
},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials](https://python.langchain.com/docs/tutorials/rag/)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "76127f4a2f6a44fba749ea7800e59d51",
|
||||
"metadata": {
|
||||
"id": "76127f4a2f6a44fba749ea7800e59d51"
|
||||
},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For full details on the `BigtableVectorStore` class, see the source code on [GitHub](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/vector_store.py)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"toc_visible": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
Reference in New Issue
Block a user