This commit is contained in:
Stefano Lottini 2025-04-27 15:38:17 -04:00 committed by GitHub
commit 47f92df299
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 371 additions and 161 deletions

View File

@ -14,7 +14,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API."
"> [DataStax Astra DB](https://docs.datastax.com/en/astra-db-serverless/index.html) is a serverless \n",
"> AI-ready database built on `Apache Cassandra®` and made conveniently available \n",
"> through an easy-to-use JSON API."
]
},
{

View File

@ -1214,9 +1214,7 @@
"source": [
"### Connecting to the DB\n",
"\n",
"The Cassandra caches shown in this page can be used with Cassandra as well as other derived databases, such as Astra DB, which use the CQL (Cassandra Query Language) protocol.\n",
"\n",
"> DataStax [Astra DB](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths.\n",
"The Cassandra caches shown in this page can be used with Cassandra as well as other derived databases that can use the CQL (Cassandra Query Language) protocol, such as DataStax Astra DB.\n",
"\n",
"Depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when instantiating the cache (through initialization of a CassIO connection)."
]
@ -1517,6 +1515,12 @@
"source": [
"You can easily use [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) as an LLM cache, with either the \"exact\" or the \"semantic-based\" cache.\n",
"\n",
"> [DataStax Astra DB](https://docs.datastax.com/en/astra-db-serverless/index.html) is a serverless \n",
"> AI-ready database built on `Apache Cassandra®` and made conveniently available \n",
"> through an easy-to-use JSON API.\n",
"\n",
"_This approach differs from the `Cassandra` caches mentioned above in that it natively uses the HTTP Data API. The Data API is specific to Astra DB. Keep in mind that the storage format will also differ._\n",
"\n",
"Make sure you have a running database (it must be a Vector-enabled database to use the Semantic cache) and get the required credentials on your Astra dashboard:\n",
"\n",
"- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
@ -3160,7 +3164,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.12.0"
}
},
"nbformat": 4,

View File

@ -7,7 +7,9 @@
"source": [
"# Astra DB \n",
"\n",
"> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API.\n",
"> [DataStax Astra DB](https://docs.datastax.com/en/astra-db-serverless/index.html) is a serverless \n",
"> AI-ready database built on `Apache Cassandra®` and made conveniently availablev\n",
"> through an easy-to-use JSON API.\n",
"\n",
"This notebook goes over how to use Astra DB to store chat message history."
]

View File

@ -1,8 +1,6 @@
# Astra DB
> [DataStax Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless
> vector-capable database built on `Apache Cassandra®`and made conveniently available
> through an easy-to-use JSON API.
> [DataStax Astra DB](https://docs.datastax.com/en/astra-db-serverless/index.html) is a serverless AI-ready database built on `Apache Cassandra®` and made conveniently available through an easy-to-use JSON API.
See a [tutorial provided by DataStax](https://docs.datastax.com/en/astra/astra-db-vector/tutorials/chatbot.html).
@ -10,19 +8,21 @@ See a [tutorial provided by DataStax](https://docs.datastax.com/en/astra/astra-d
Install the following Python package:
```bash
pip install "langchain-astradb>=0.1.0"
pip install "langchain-astradb>=0.6,<0.7"
```
Get the [connection secrets](https://docs.datastax.com/en/astra/astra-db-vector/get-started/quickstart.html).
Set up the following environment variables:
Create a database (if needed) and get the [connection secrets](https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html#create-a-database-and-store-your-credentials).
Set the following variables:
```python
ASTRA_DB_APPLICATION_TOKEN="TOKEN"
ASTRA_DB_API_ENDPOINT="API_ENDPOINT"
ASTRA_DB_APPLICATION_TOKEN="TOKEN"
```
## Vector Store
A few typical initialization patterns are shown here:
```python
from langchain_astradb import AstraDBVectorStore
@ -32,8 +32,56 @@ vector_store = AstraDBVectorStore(
api_endpoint=ASTRA_DB_API_ENDPOINT,
token=ASTRA_DB_APPLICATION_TOKEN,
)
from astrapy.info import VectorServiceOptions
vector_store_vectorize = AstraDBVectorStore(
collection_name="my_vectorize_store",
api_endpoint=ASTRA_DB_API_ENDPOINT,
token=ASTRA_DB_APPLICATION_TOKEN,
collection_vector_service_options=VectorServiceOptions(
provider="nvidia",
model_name="NV-Embed-QA",
),
)
from astrapy.info import (
CollectionLexicalOptions,
CollectionRerankOptions,
RerankServiceOptions,
VectorServiceOptions,
)
vector_store_hybrid = AstraDBVectorStore(
collection_name="my_hybrid_store",
api_endpoint=ASTRA_DB_API_ENDPOINT,
token=ASTRA_DB_APPLICATION_TOKEN,
collection_vector_service_options=VectorServiceOptions(
provider="nvidia",
model_name="NV-Embed-QA",
),
collection_lexical=CollectionLexicalOptions(analyzer="standard"),
collection_rerank=CollectionRerankOptions(
service=RerankServiceOptions(
provider="nvidia",
model_name="nvidia/llama-3.2-nv-rerankqa-1b-v2",
),
),
)
```
Notable features of class `AstraDBVectorStore`:
- native async API;
- metadata filtering in search;
- MMR (maximum marginal relevance) search;
- server-side embedding computation (["vectorize"](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html) in Astra DB parlance);
- auto-detect its settings from an existing, pre-populated Astra DB collection;
- [hybrid search](https://docs.datastax.com/en/astra-db-serverless/databases/hybrid-search.html#the-hybrid-search-process) (vector + BM25 and then a rerank step);
- support for non-Astra Data API (e.g. self-hosted [HCD](https://docs.datastax.com/en/hyper-converged-database/1.1/get-started/get-started-hcd.html) deployments);
Learn more in the [example notebook](/docs/integrations/vectorstores/astradb).
See the [example provided by DataStax](https://docs.datastax.com/en/astra/astra-db-vector/integrations/langchain.html).
@ -82,8 +130,6 @@ set_llm_cache(AstraDBSemanticCache(
Learn more in the [example notebook](/docs/integrations/llm_caching#astra-db-caches) (scroll to the appropriate section).
Learn more in the [example notebook](/docs/integrations/memory/astradb_chat_message_history).
## Document loader
```python

View File

@ -6,7 +6,9 @@
"source": [
"# Astra DB\n",
"\n",
">[DataStax Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on `Cassandra` and made conveniently available through an easy-to-use JSON API.\n",
"> [DataStax Astra DB](https://docs.datastax.com/en/astra-db-serverless/index.html) is a serverless \n",
"> AI-ready database built on `Apache Cassandra®` and made conveniently available \n",
"> through an easy-to-use JSON API.\n",
"\n",
"In the walkthrough, we'll demo the `SelfQueryRetriever` with an `Astra DB` vector store."
]

View File

@ -23,7 +23,9 @@
"\n",
"## Overview\n",
"\n",
"DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API.\n",
"> [DataStax Astra DB](https://docs.datastax.com/en/astra-db-serverless/index.html) is a serverless \n",
"> AI-ready database built on `Apache Cassandra®` and made conveniently available \n",
"> through an easy-to-use JSON API.\n",
"\n",
"### Integration details\n",
"\n",

View File

@ -7,9 +7,11 @@
"source": [
"# Astra DB Vector Store\n",
"\n",
"This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) as a Vector Store.\n",
"This page provides a quickstart for using Astra DB as a Vector Store.\n",
"\n",
"> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available through an easy-to-use JSON API.\n",
"> [DataStax Astra DB](https://docs.datastax.com/en/astra-db-serverless/index.html) is a serverless \n",
"> AI-ready database built on `Apache Cassandra®` and made conveniently available \n",
"> through an easy-to-use JSON API.\n",
"\n",
"## Setup"
]
@ -19,6 +21,8 @@
"id": "dbe7c156-0413-47e3-9237-4769c4248869",
"metadata": {},
"source": [
"### Dependencies\n",
"\n",
"Use of the integration requires the `langchain-astradb` partner package:"
]
},
@ -26,10 +30,15 @@
"cell_type": "code",
"execution_count": null,
"id": "8d00fcf4-9798-4289-9214-d9734690adfc",
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"pip install -qU \"langchain-astradb>=0.3.3\""
"!pip install \\\n",
" \"langchain>=0.3.23,<0.4\" \\\n",
" \"langchain-core>=0.3.52,<0.4\" \\\n",
" \"langchain-astradb>=0.6,<0.7\""
]
},
{
@ -41,30 +50,40 @@
"\n",
"In order to use the AstraDB vector store, you must first head to the [AstraDB website](https://astra.datastax.com), create an account, and then create a new database - the initialization might take a few minutes. \n",
"\n",
"Once the database has been initialized, you should [create an application token](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html#generate-application-token) and save it for later use. \n",
"Once the database has been initialized, retrieve your [connection secrets](https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html#create-a-database-and-store-your-credentials), which you'll need momentarily. These are:\n",
"- an **`API Endpoint`**, such as `\"https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com/\"`\n",
"- and a **`Database Token`**, e.g. `\"AstraCS:aBcD123......\"`\n",
"\n",
"You will also want to copy the `API Endpoint` from the `Database Details` and store that in the `ASTRA_DB_API_ENDPOINT` variable.\n",
"\n",
"You may optionally provide a namespace, which you can manage from the `Data Explorer` tab of your database dashboard. If you don't wish to set a namespace, you can leave the `getpass` prompt for `ASTRA_DB_NAMESPACE` empty."
"You may optionally provide a **`keyspace`** (called \"namespace\" in the LangChain components), which you can manage from the `Data Explorer` tab of your database dashboard. If you wish, you can leave it empty in the prompt below and fall back to a default keyspace."
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 1,
"id": "b7843c22",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"ASTRA_DB_API_ENDPOINT = https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n",
"ASTRA_DB_APPLICATION_TOKEN = ········\n",
"(optional) ASTRA_DB_KEYSPACE = \n"
]
}
],
"source": [
"import getpass\n",
"\n",
"ASTRA_DB_API_ENDPOINT = getpass.getpass(\"ASTRA_DB_API_ENDPOINT = \")\n",
"ASTRA_DB_APPLICATION_TOKEN = getpass.getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
"ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \").strip()\n",
"ASTRA_DB_APPLICATION_TOKEN = getpass.getpass(\"ASTRA_DB_APPLICATION_TOKEN = \").strip()\n",
"\n",
"desired_namespace = getpass.getpass(\"ASTRA_DB_NAMESPACE = \")\n",
"if desired_namespace:\n",
" ASTRA_DB_NAMESPACE = desired_namespace\n",
"desired_keyspace = input(\"(optional) ASTRA_DB_KEYSPACE = \").strip()\n",
"if desired_keyspace:\n",
" ASTRA_DB_KEYSPACE = desired_keyspace\n",
"else:\n",
" ASTRA_DB_NAMESPACE = None"
" ASTRA_DB_KEYSPACE = None"
]
},
{
@ -77,7 +96,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"id": "3cb739c0",
"metadata": {},
"outputs": [],
@ -93,28 +112,46 @@
"source": [
"## Initialization\n",
"\n",
"There are two ways to create an Astra DB vector store, which differ in how the embeddings are computed.\n",
"There are various ways to create an Astra DB vector store:\n",
"\n",
"#### Method 1: Explicit embeddings\n",
"\n",
"You can separately instantiate a `langchain_core.embeddings.Embeddings` class and pass it to the `AstraDBVectorStore` constructor, just like with most other LangChain vector stores.\n",
"\n",
"#### Method 2: Integrated embedding computation\n",
"#### Method 2: Server-side embeddings ('vectorize')\n",
"\n",
"Alternatively, you can use the [Vectorize](https://www.datastax.com/blog/simplifying-vector-embedding-generation-with-astra-vectorize) feature of Astra DB and simply specify the name of a supported embedding model when creating the store. The embedding computations are entirely handled within the database. (To proceed with this method, you must have enabled the desired embedding integration for your database, as described [in the docs](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html).)\n",
"Alternatively, you can use the [server-side embedding computation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html) feature of Astra DB ('vectorize') and simply specify an embedding model when creating the server infrastructure for the store. The embedding computations will then be entirely handled within the database in subsequent read and write operations. (To proceed with this method, you must have enabled the desired embedding integration for your database, as described [in the docs](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html).)\n",
"\n",
"### Explicit Embedding Initialization\n",
"#### Method 3: Auto-detect from a pre-existing collection\n",
"\n",
"Below, we instantiate our vector store using the explicit embedding class:\n",
"You may already have a [collection](https://docs.datastax.com/en/astra-db-serverless/api-reference/collections.html) in your Astra DB, possibly pre-populated with data through other means (e.g. via the Astra UI or a third-party application), and just want to start querying it within LangChain. In this case, the right approach is to enable the `autodetect_collection` mode in the vector store constructor and let the class figure out the details. (Of course, if your collection has no 'vectorize', you still need to provide an `Embeddings` object).\n",
"\n",
"#### A note on \"hybrid search\"\n",
"\n",
"Astra DB vector stores support metadata search in vector searches; furthermore, version 0.6 introduced full support for _hybrid search_ through the [findAndRerank](https://docs.datastax.com/en/astra-db-serverless/api-reference/document-methods/find-and-rerank.html) database primitive: documents are retrieved from both a vector-similarity _and_ a keyword-based (\"lexical\") search, and are then merged through a reranker model. This search strategy, entirely handled on server-side, can boost the accuracy of your results, thus improving the quality of your RAG application. Whenever available, hybrid search is used automatically by the vector store (though you can exert manual control over it if you wish to do so).\n",
"\n",
"#### Additional information\n",
"\n",
"The `AstraDBVectorStore` can be configured in many ways; see the [API Reference](https://python.langchain.com/api_reference/astradb/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html) for a full guide covering e.g. asynchronous initialization; non-Astra-DB databases; custom indexing allow-/deny-lists; manual hybrid-search control; and much more."
]
},
{
"cell_type": "markdown",
"id": "8d7e33e0-f948-47b5-a9c2-6407fdde170e",
"metadata": {},
"source": [
"### Explicit embedding initialization (method 1)\n",
"\n",
"Instantiate our vector store using an explicit embedding class:\n",
"\n",
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
"\n",
"<EmbeddingTabs/>\n"
"<EmbeddingTabs/>"
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 3,
"id": "d71a1dcb",
"metadata": {},
"outputs": [],
@ -128,19 +165,19 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 4,
"id": "0b32730d-176e-414c-9d91-fd3644c54211",
"metadata": {},
"outputs": [],
"source": [
"from langchain_astradb import AstraDBVectorStore\n",
"\n",
"vector_store = AstraDBVectorStore(\n",
"vector_store_explicit_embeddings = AstraDBVectorStore(\n",
" collection_name=\"astra_vector_langchain\",\n",
" embedding=embeddings,\n",
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
" namespace=ASTRA_DB_NAMESPACE,\n",
" namespace=ASTRA_DB_KEYSPACE,\n",
")"
]
},
@ -149,26 +186,26 @@
"id": "84a1fe85-a42c-4f15-92e1-f79f1dd43ea2",
"metadata": {},
"source": [
"### Integrated Embedding Initialization\n",
"### Server-side embedding initialization (\"vectorize\", method 2)\n",
"\n",
"Here it is assumed that you have\n",
"In this example code, it is assumed that you have\n",
"\n",
"- Enabled the OpenAI integration in your Astra DB organization,\n",
"- Added an API Key named `\"OPENAI_API_KEY\"` to the integration, and scoped it to the database you are using.\n",
"\n",
"For more details on how to do this, please consult the [documentation](https://docs.datastax.com/en/astra-db-serverless/integrations/embedding-providers/openai.html)."
"For more details, including instructions to switch provider/model, please consult the [documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"id": "9d18455d-3fa6-4f9e-b687-3a2bc71c9a23",
"metadata": {},
"outputs": [],
"source": [
"from astrapy.info import CollectionVectorServiceOptions\n",
"from astrapy.info import VectorServiceOptions\n",
"\n",
"openai_vectorize_options = CollectionVectorServiceOptions(\n",
"openai_vectorize_options = VectorServiceOptions(\n",
" provider=\"openai\",\n",
" model_name=\"text-embedding-3-small\",\n",
" authentication={\n",
@ -176,125 +213,183 @@
" },\n",
")\n",
"\n",
"vector_store_integrated = AstraDBVectorStore(\n",
" collection_name=\"astra_vector_langchain_integrated\",\n",
"vector_store_integrated_embeddings = AstraDBVectorStore(\n",
" collection_name=\"astra_vectorize_langchain\",\n",
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
" namespace=ASTRA_DB_NAMESPACE,\n",
" namespace=ASTRA_DB_KEYSPACE,\n",
" collection_vector_service_options=openai_vectorize_options,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "24508a60-9591-4b24-a9b7-ecc90ed71b68",
"metadata": {},
"source": [
"### Auto-detect initialization (method 3)\n",
"\n",
"You can use this pattern if the collection already exists on the database and your `AstraDBVectorStore` needs to use it (for reads and writes). The LangChain component will inspect the collection and figure out the details.\n",
"\n",
"This is the recommended approach if the collection has been created and -- most importantly -- populated by tools other than LangChain, for example if the data has been ingested through the Astra DB Web interface.\n",
"\n",
"Auto-detect mode cannot coexist with _collection_ settings (such as the similarity metric and such); on the other hand, if no server-side embeddings are employed, one still needs to pass an `Embeddings` object to the constructor.\n",
"\n",
"In the following example code, we will \"auto-detect\" the very same collection that was created by method 2 above (\"vectorize\"). Hence, no `Embeddings` object needs to be supplied."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "683b0f6e-884f-4a09-bc3a-454bb1eefd30",
"metadata": {},
"outputs": [],
"source": [
"vector_store_autodetected = AstraDBVectorStore(\n",
" collection_name=\"astra_vectorize_langchain\",\n",
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
" namespace=ASTRA_DB_KEYSPACE,\n",
" autodetect_collection=True,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "fbcfe8e8-2f4e-4fc7-a332-7a2fa2c401bf",
"metadata": {},
"source": [
"## Manage vector store\n",
"\n",
"Once you have created your vector store, interact with it by adding and deleting different items.\n",
"\n",
"All interactions with the vector store proceed regardless of the initialization method: please **adapt the following cell**, if you desire, to select a vector store you have created and want to put to test."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "54d63f59-1e6b-49b4-a7c1-ac7717c92ac0",
"metadata": {},
"outputs": [],
"source": [
"# If desired, uncomment a different line here:\n",
"\n",
"# vector_store = vector_store_explicit_embeddings\n",
"vector_store = vector_store_integrated_embeddings\n",
"# vector_store = vector_store_autodetected"
]
},
{
"cell_type": "markdown",
"id": "d3796b39",
"metadata": {},
"source": [
"## Manage vector store\n",
"\n",
"Once you have created your vector store, we can interact with it by adding and deleting different items.\n",
"\n",
"### Add items to vector store\n",
"\n",
"We can add items to our vector store by using the `add_documents` function."
"Add documents to the vector store by using the `add_documents` method.\n",
"\n",
"_The \"id\" field can be supplied separately, in a matching `ids=[...]` parameter to `add_documents`, or even left out entirely to let the store generate IDs._"
]
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 8,
"id": "afb3e155",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[UUID('89a5cea1-5f3d-47c1-89dc-7e36e12cf4de'),\n",
" UUID('d4e78c48-f954-4612-8a38-af22923ba23b'),\n",
" UUID('058e4046-ded0-4fc1-b8ac-60e5a5f08ea0'),\n",
" UUID('50ab2a9a-762c-4b78-b102-942a86d77288'),\n",
" UUID('1da5a3c1-ba51-4f2f-aaaf-79a8f5011ce3'),\n",
" UUID('f3055d9e-2eb1-4d25-838e-2c70548f91b5'),\n",
" UUID('4bf0613d-08d0-4fbc-a43c-4955e4c9e616'),\n",
" UUID('18008625-8fd4-45c2-a0d7-92a2cde23dbc'),\n",
" UUID('c712e06f-790b-4fd4-9040-7ab3898965d0'),\n",
" UUID('a9b84820-3445-4810-a46c-e77b76ab85bc')]"
"['entry_00',\n",
" 'entry_01',\n",
" 'entry_02',\n",
" 'entry_03',\n",
" 'entry_04',\n",
" 'entry_05',\n",
" 'entry_06',\n",
" 'entry_07',\n",
" 'entry_08',\n",
" 'entry_09',\n",
" 'entry_10']"
]
},
"execution_count": 23,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from uuid import uuid4\n",
"\n",
"from langchain_core.documents import Document\n",
"\n",
"document_1 = Document(\n",
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"document_2 = Document(\n",
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
" metadata={\"source\": \"news\"},\n",
")\n",
"\n",
"document_3 = Document(\n",
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"document_4 = Document(\n",
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
" metadata={\"source\": \"news\"},\n",
")\n",
"\n",
"document_5 = Document(\n",
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"document_6 = Document(\n",
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
" metadata={\"source\": \"website\"},\n",
")\n",
"\n",
"document_7 = Document(\n",
" page_content=\"The top 10 soccer players in the world right now.\",\n",
" metadata={\"source\": \"website\"},\n",
")\n",
"\n",
"document_8 = Document(\n",
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"document_9 = Document(\n",
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
" metadata={\"source\": \"news\"},\n",
")\n",
"\n",
"document_10 = Document(\n",
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"documents = [\n",
" document_1,\n",
" document_2,\n",
" document_3,\n",
" document_4,\n",
" document_5,\n",
" document_6,\n",
" document_7,\n",
" document_8,\n",
" document_9,\n",
" document_10,\n",
"documents_to_insert = [\n",
" Document(\n",
" page_content=\"ZYX, just another tool in the world, is actually my agent-based superhero\",\n",
" metadata={\"source\": \"tweet\"},\n",
" id=\"entry_00\",\n",
" ),\n",
" Document(\n",
" page_content=\"I had chocolate chip pancakes and scrambled eggs \"\n",
" \"for breakfast this morning.\",\n",
" metadata={\"source\": \"tweet\"},\n",
" id=\"entry_01\",\n",
" ),\n",
" Document(\n",
" page_content=\"The weather forecast for tomorrow is cloudy and \"\n",
" \"overcast, with a high of 62 degrees.\",\n",
" metadata={\"source\": \"news\"},\n",
" id=\"entry_02\",\n",
" ),\n",
" Document(\n",
" page_content=\"Building an exciting new project with LangChain \"\n",
" \"- come check it out!\",\n",
" metadata={\"source\": \"tweet\"},\n",
" id=\"entry_03\",\n",
" ),\n",
" Document(\n",
" page_content=\"Robbers broke into the city bank and stole \"\n",
" \"$1 million in cash.\",\n",
" metadata={\"source\": \"news\"},\n",
" id=\"entry_04\",\n",
" ),\n",
" Document(\n",
" page_content=\"Thanks to her sophisticated language skills, the agent \"\n",
" \"managed to extract strategic information all right.\",\n",
" metadata={\"source\": \"tweet\"},\n",
" id=\"entry_05\",\n",
" ),\n",
" Document(\n",
" page_content=\"Is the new iPhone worth the price? Read this \"\n",
" \"review to find out.\",\n",
" metadata={\"source\": \"website\"},\n",
" id=\"entry_06\",\n",
" ),\n",
" Document(\n",
" page_content=\"The top 10 soccer players in the world right now.\",\n",
" metadata={\"source\": \"website\"},\n",
" id=\"entry_07\",\n",
" ),\n",
" Document(\n",
" page_content=\"LangGraph is the best framework for building stateful, \"\n",
" \"agentic applications!\",\n",
" metadata={\"source\": \"tweet\"},\n",
" id=\"entry_08\",\n",
" ),\n",
" Document(\n",
" page_content=\"The stock market is down 500 points today due to \"\n",
" \"fears of a recession.\",\n",
" metadata={\"source\": \"news\"},\n",
" id=\"entry_09\",\n",
" ),\n",
" Document(\n",
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
" metadata={\"source\": \"tweet\"},\n",
" id=\"entry_10\",\n",
" ),\n",
"]\n",
"uuids = [str(uuid4()) for _ in range(len(documents))]\n",
"\n",
"vector_store.add_documents(documents=documents, ids=uuids)"
"\n",
"vector_store.add_documents(documents=documents_to_insert)"
]
},
{
@ -304,12 +399,12 @@
"source": [
"### Delete items from vector store\n",
"\n",
"We can delete items from our vector store by ID by using the `delete` function."
"Delete items by ID by using the `delete` function."
]
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 9,
"id": "d3f69315",
"metadata": {},
"outputs": [
@ -319,13 +414,13 @@
"True"
]
},
"execution_count": 24,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vector_store.delete(ids=uuids[-1])"
"vector_store.delete(ids=[\"entry_10\", \"entry_02\"])"
]
},
{
@ -333,20 +428,20 @@
"id": "d12e1a07",
"metadata": {},
"source": [
"## Query vector store\n",
"## Query the vector store\n",
"\n",
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
"Once the vector store is created and populated, you can query it (e.g. as part of your chain or agent).\n",
"\n",
"### Query directly\n",
"\n",
"#### Similarity search\n",
"\n",
"Performing a simple similarity search with filtering on metadata can be done as follows:"
"Search for documents similar to a provided text, with additional metadata filters if desired:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 10,
"id": "770b3467",
"metadata": {},
"outputs": [
@ -354,19 +449,20 @@
"name": "stdout",
"output_type": "stream",
"text": [
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
"* \"Building an exciting new project with LangChain - come check it out!\", metadata={'source': 'tweet'}\n",
"* \"LangGraph is the best framework for building stateful, agentic applications!\", metadata={'source': 'tweet'}\n",
"* \"Thanks to her sophisticated language skills, the agent managed to extract strategic information all right.\", metadata={'source': 'tweet'}\n"
]
}
],
"source": [
"results = vector_store.similarity_search(\n",
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
" k=2,\n",
" k=3,\n",
" filter={\"source\": \"tweet\"},\n",
")\n",
"for res in results:\n",
" print(f\"* {res.page_content} [{res.metadata}]\")"
" print(f'* \"{res.page_content}\", metadata={res.metadata}')"
]
},
{
@ -376,12 +472,12 @@
"source": [
"#### Similarity search with score\n",
"\n",
"You can also search with score:"
"You can return the similarity score as well:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 11,
"id": "5924309a",
"metadata": {},
"outputs": [
@ -389,16 +485,69 @@
"name": "stdout",
"output_type": "stream",
"text": [
"* [SIM=0.776585] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]\n"
"* [SIM=0.71] \"Building an exciting new project with LangChain - come check it out!\", metadata={'source': 'tweet'}\n",
"* [SIM=0.70] \"LangGraph is the best framework for building stateful, agentic applications!\", metadata={'source': 'tweet'}\n",
"* [SIM=0.61] \"Thanks to her sophisticated language skills, the agent managed to extract strategic information all right.\", metadata={'source': 'tweet'}\n"
]
}
],
"source": [
"results = vector_store.similarity_search_with_score(\n",
" \"Will it be hot tomorrow?\", k=1, filter={\"source\": \"news\"}\n",
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
" k=3,\n",
" filter={\"source\": \"tweet\"},\n",
")\n",
"for res, score in results:\n",
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
" print(f'* [SIM={score:.2f}] \"{res.page_content}\", metadata={res.metadata}')"
]
},
{
"cell_type": "markdown",
"id": "73b8f418-91a7-46d0-91c3-3c76e9586193",
"metadata": {},
"source": [
"#### Specify a different keyword query (requires hybrid search)\n",
"\n",
"> Note: this cell can be run only if the collection supports the [find-and-rerank](https://docs.datastax.com/en/astra-db-serverless/api-reference/document-methods/find-and-rerank.html) command and if the vector store is aware of this fact.\n",
"\n",
"If the vector store is using a hybrid-enabled collection and has detected this fact, by default it will use that capability when running searches.\n",
"\n",
"In that case, the same query text is used for both the vector-similarity and the lexical-based retrieval steps in the find-and-rerank process, _unless you explicitly provide a different query for the latter_:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "e282a48b-081a-4d94-9483-33407e8d6da7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* \"Building an exciting new project with LangChain - come check it out!\", metadata={'source': 'tweet'}\n",
"* \"LangGraph is the best framework for building stateful, agentic applications!\", metadata={'source': 'tweet'}\n",
"* \"ZYX, just another tool in the world, is actually my agent-based superhero\", metadata={'source': 'tweet'}\n"
]
}
],
"source": [
"results = vector_store_autodetected.similarity_search(\n",
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
" k=3,\n",
" filter={\"source\": \"tweet\"},\n",
" lexical_query=\"agent\",\n",
")\n",
"for res in results:\n",
" print(f'* \"{res.page_content}\", metadata={res.metadata}')"
]
},
{
"cell_type": "markdown",
"id": "60688e8c-d74d-4921-b213-b48d88600f95",
"metadata": {},
"source": [
"_The above example hardcodes the \"autodetected\" vector store, which has surely inspected the collection and figured out if hybrid is available. Another option is to explicitly supply hybrid-search parameters to the constructor (refer to the API Reference for more details/examples)._"
]
},
{
@ -408,7 +557,9 @@
"source": [
"#### Other search methods\n",
"\n",
"There are a variety of other search methods that are not covered in this notebook, such as MMR search or searching by vector. For a full list of the search abilities available for `AstraDBVectorStore` check out the [API reference](https://python.langchain.com/api_reference/astradb/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html)."
"There are a variety of other search methods that are not covered in this notebook, such as MMR search and search by vector.\n",
"\n",
"For a full list of the search modes available in `AstraDBVectorStore` check out the [API reference](https://python.langchain.com/api_reference/astradb/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html)."
]
},
{
@ -418,24 +569,24 @@
"source": [
"### Query by turning into retriever\n",
"\n",
"You can also transform the vector store into a retriever for easier usage in your chains. \n",
"You can also make the vector store into a retriever, for easier usage in your chains. \n",
"\n",
"Here is how to transform your vector store into a retriever and then invoke the retreiever with a simple query and filter."
"Transform the vector store into a retriever and invoke it with a simple query + metadata filter:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 13,
"id": "dcee50e6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
"[Document(id='entry_04', metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
]
},
"execution_count": 17,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
@ -490,7 +641,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"id": "fd405a13-6f71-46fa-87e6-167238e9c25e",
"metadata": {},
"outputs": [],
@ -505,7 +656,7 @@
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `AstraDBVectorStore` features and configurations head to the API reference: https://python.langchain.com/api_reference/astradb/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html"
"For detailed documentation of all `AstraDBVectorStore` features and configurations, consult the [API reference](https://python.langchain.com/api_reference/astradb/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html)."
]
}
],
@ -525,7 +676,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.12.0"
}
},
"nbformat": 4,

View File

@ -164,6 +164,7 @@ packages:
downloads: 2756214
downloads_updated_at: '2025-04-22T15:24:39.289813+00:00'
- name: langchain-astradb
name_title: DataStax Astra DB
path: libs/astradb
repo: langchain-ai/langchain-datastax
downloads: 100973