mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-06 21:43:44 +00:00
Add "Astra DB" vector store integration (#12966)
# Astra DB Vector store integration - **Description:** This PR adds a `VectorStore` implementation for DataStax Astra DB using its HTTP API - **Issue:** (no related issue) - **Dependencies:** A new required dependency is `astrapy` (`>=0.5.3`) which was added to pyptoject.toml, optional, as per guidelines - **Tag maintainer:** I recently mentioned to @baskaryan this integration was coming - **Twitter handle:** `@rsprrs` if you want to mention me This PR introduces the `AstraDB` vector store class, extensive integration test coverage, a reworking of the documentation which conflates Cassandra and Astra DB on a single "provider" page and a new, completely reworked vector-store example notebook (common to the Cassandra store, since parts of the flow is shared by the two APIs). I also took care in ensuring docs (and redirects therein) are behaving correctly. All style, linting, typechecks and tests pass as far as the `AstraDB` integration is concerned. I could build the documentation and check it all right (but ran into trouble with the `api_docs_build` makefile target which I could not verify: `Error: Unable to import module 'plan_and_execute.agent_executor' with error: No module named 'langchain_experimental'` was the first of many similar errors) Thank you for a review! Stefano --------- Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
85
docs/docs/integrations/providers/astradb.mdx
Normal file
85
docs/docs/integrations/providers/astradb.mdx
Normal file
@@ -0,0 +1,85 @@
|
||||
# Astra DB
|
||||
|
||||
This page lists the integrations available with [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) and [Apache Cassandra®](https://cassandra.apache.org/).
|
||||
|
||||
### Setup
|
||||
|
||||
Install the following Python package:
|
||||
|
||||
```bash
|
||||
pip install "astrapy>=0.5.3"
|
||||
```
|
||||
|
||||
## Astra DB
|
||||
|
||||
> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available
|
||||
> through an easy-to-use JSON API.
|
||||
|
||||
### Vector Store
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import AstraDB
|
||||
vector_store = AstraDB(
|
||||
embedding=my_embedding,
|
||||
collection_name="my_store",
|
||||
api_endpoint="...",
|
||||
token="...",
|
||||
)
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/vectorstores/astradb).
|
||||
|
||||
|
||||
## Apache Cassandra and Astra DB through CQL
|
||||
|
||||
> [Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.
|
||||
> Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).
|
||||
> DataStax [Astra DB through CQL](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths.
|
||||
|
||||
These databases use the CQL protocol (Cassandra Query Language).
|
||||
Hence, a different set of connectors, outlined below, shall be used.
|
||||
|
||||
### Vector Store
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Cassandra
|
||||
vector_store = Cassandra(
|
||||
embedding=my_embedding,
|
||||
table_name="my_store",
|
||||
)
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/vectorstores/astradb) (scroll down to the CQL-specific section).
|
||||
|
||||
|
||||
### Memory
|
||||
|
||||
```python
|
||||
from langchain.memory import CassandraChatMessageHistory
|
||||
message_history = CassandraChatMessageHistory(session_id="my-session")
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/memory/cassandra_chat_message_history).
|
||||
|
||||
|
||||
### LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.cache import CassandraCache
|
||||
langchain.llm_cache = CassandraCache()
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching) (scroll to the Cassandra section).
|
||||
|
||||
|
||||
### Semantic LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.cache import CassandraSemanticCache
|
||||
cassSemanticCache = CassandraSemanticCache(
|
||||
embedding=my_embedding,
|
||||
table_name="my_store",
|
||||
)
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching) (scroll to the appropriate section).
|
@@ -1,35 +0,0 @@
|
||||
# Cassandra
|
||||
|
||||
>[Apache Cassandra®](https://cassandra.apache.org/) is a free and open-source, distributed, wide-column
|
||||
> store, NoSQL database management system designed to handle large amounts of data across many commodity servers,
|
||||
> providing high availability with no single point of failure. Cassandra offers support for clusters spanning
|
||||
> multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
|
||||
> Cassandra was designed to implement a combination of _Amazon's Dynamo_ distributed storage and replication
|
||||
> techniques combined with _Google's Bigtable_ data and storage engine model.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install cassandra-driver
|
||||
pip install cassio
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Vector Store
|
||||
|
||||
See a [usage example](/docs/integrations/vectorstores/cassandra).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Cassandra
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Memory
|
||||
|
||||
See a [usage example](/docs/integrations/memory/cassandra_chat_message_history).
|
||||
|
||||
```python
|
||||
from langchain.memory import CassandraChatMessageHistory
|
||||
```
|
749
docs/docs/integrations/vectorstores/astradb.ipynb
Normal file
749
docs/docs/integrations/vectorstores/astradb.ipynb
Normal file
@@ -0,0 +1,749 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d2d6ca14-fb7e-4172-9aa0-a3119a064b96",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Astra DB\n",
|
||||
"\n",
|
||||
"This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) and [Apache Cassandra®](https://cassandra.apache.org/) as a Vector Store.\n",
|
||||
"\n",
|
||||
"_Note: in addition to access to the database, an OpenAI API Key is required to run the full example._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bb9be7ce-8c70-4d46-9f11-71c42a36e928",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Setup and general dependencies"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dbe7c156-0413-47e3-9237-4769c4248869",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Use of the integration requires the following Python package.\n",
|
||||
"\n",
|
||||
"_Note: depending on your LangChain setup, you may need to install other dependencies needed for this demo._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8d00fcf4-9798-4289-9214-d9734690adfc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install --quiet \"astrapy>=0.5.3\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b06619af-fea2-4863-8149-7f239a8c9c82",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"from datasets import load_dataset # if not present yet, run: pip install \"datasets==2.14.6\"\n",
|
||||
"\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.document_loaders import PyPDFLoader\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.runnable import RunnablePassthrough\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1983f1da-0ae7-4a9b-bf4c-4ade328f7a3a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass(\"OPENAI_API_KEY = \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c656df06-e938-4bc5-b570-440b8b7a0189",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embe = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dd8caa76-bc41-429e-a93b-989ba13aff01",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_Keep reading to connect with Astra DB. For usage with Apache Cassandra and Astra DB through CQL, scroll to the section below._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "22866f09-e10d-4f05-a24b-b9420129462e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Astra DB"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5fba47cc-3533-42fc-84b7-9dc14cd68b2b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0b32730d-176e-414c-9d91-fd3644c54211",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import AstraDB"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68f61b01-3e09-47c1-9d67-5d6915c86626",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Astra DB connection parameters\n",
|
||||
"\n",
|
||||
"- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
|
||||
"- the Token looks like `AstraCS:6gBhNmsk135....`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d78af8ed-cff9-4f14-aa5d-016f99ab547c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n",
|
||||
"ASTRA_DB_TOKEN = getpass(\"ASTRA_DB_TOKEN = \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8b77553b-8bb5-4949-b87b-8c6abac56a26",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vstore = AstraDB(\n",
|
||||
" embedding=embe,\n",
|
||||
" collection_name=\"astra_vector_demo\",\n",
|
||||
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
|
||||
" token=ASTRA_DB_TOKEN,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9a348678-b2f6-46ca-9a0d-2eb4cc6b66b1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load a dataset"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3a1f532f-ad63-4256-9730-a183841bd8e9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"philo_dataset = load_dataset(\"datastax/philosopher-quotes\")[\"train\"]\n",
|
||||
"\n",
|
||||
"docs = []\n",
|
||||
"for entry in philo_dataset:\n",
|
||||
" metadata = {\"author\": entry[\"author\"]}\n",
|
||||
" doc = Document(page_content=entry[\"quote\"], metadata=metadata)\n",
|
||||
" docs.append(doc)\n",
|
||||
"\n",
|
||||
"inserted_ids = vstore.add_documents(docs)\n",
|
||||
"print(f\"\\nInserted {len(inserted_ids)} documents.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "084d8802-ab39-4262-9a87-42eafb746f92",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Add some more entries, this time with `add_texts`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b6b157f5-eb31-4907-a78e-2e2b06893936",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"texts = [\"I think, therefore I am.\", \"To the things themselves!\"]\n",
|
||||
"metadatas = [{\"author\": \"descartes\"}, {\"author\": \"husserl\"}]\n",
|
||||
"ids = [\"desc_01\", \"huss_xy\"]\n",
|
||||
"\n",
|
||||
"inserted_ids_2 = vstore.add_texts(texts=texts, metadatas=metadatas, ids=ids)\n",
|
||||
"print(f\"\\nInserted {len(inserted_ids_2)} documents.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c031760a-1fc5-4855-adf2-02ed52fe2181",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run simple searches"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02a77d8e-1aae-4054-8805-01c77947c49f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This section demonstrates metadata filtering and getting the similarity scores back:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1761806a-1afd-4491-867c-25a80d92b9fe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vstore.similarity_search(\"Our life is what we make of it\", k=3)\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eebc4f7c-f61a-438e-b3c8-17e6888d8a0b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results_filtered = vstore.similarity_search(\n",
|
||||
" \"Our life is what we make of it\",\n",
|
||||
" k=3,\n",
|
||||
" filter={\"author\": \"plato\"},\n",
|
||||
")\n",
|
||||
"for res in results_filtered:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "11bbfe64-c0cd-40c6-866a-a5786538450e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vstore.similarity_search_with_score(\"Our life is what we make of it\", k=3)\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b14ea558-bfbe-41ce-807e-d70670060ada",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### MMR (Maximal-marginal-relevance) search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "76381ce8-780a-4e3b-97b1-056d6782d7d5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vstore.max_marginal_relevance_search(\n",
|
||||
" \"Our life is what we make of it\",\n",
|
||||
" k=3,\n",
|
||||
" filter={\"author\": \"aristotle\"},\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1cc86edd-692b-4495-906c-ccfd13b03c23",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deleting stored documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "38a70ec4-b522-4d32-9ead-c642864fca37",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"delete_1 = vstore.delete(inserted_ids[:3])\n",
|
||||
"print(f\"all_succeed={delete_1}\") # True, all documents deleted"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d4cf49ed-9d29-4ed9-bdab-51a308c41b8e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"delete_2 = vstore.delete(inserted_ids[2:5])\n",
|
||||
"print(f\"some_succeeds={delete_2}\") # True, though some IDs were gone already"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "847181ba-77d1-4a17-b7f9-9e2c3d8efd13",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### A minimal RAG chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cd64b844-846f-43c5-a7dd-c26b9ed417d0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The next cells will implement a simple RAG pipeline:\n",
|
||||
"- download a sample PDF file and load it onto the store;\n",
|
||||
"- create a RAG chain with LCEL (LangChain Expression Language), with the vector store at its heart;\n",
|
||||
"- run the question-answering chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5cbc4dba-0d5e-4038-8fc5-de6cadd1c2a9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!curl -L \\\n",
|
||||
" \"https://github.com/awesome-astra/datasets/blob/main/demo-resources/what-is-philosophy/what-is-philosophy.pdf?raw=true\" \\\n",
|
||||
" -o \"what-is-philosophy.pdf\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "459385be-5e9c-47ff-ba53-2b7ae6166b09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pdf_loader = PyPDFLoader(\"what-is-philosophy.pdf\")\n",
|
||||
"splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)\n",
|
||||
"docs_from_pdf = pdf_loader.load_and_split(text_splitter=splitter)\n",
|
||||
"\n",
|
||||
"print(f\"Documents from PDF: {len(docs_from_pdf)}.\")\n",
|
||||
"inserted_ids_from_pdf = vstore.add_documents(docs_from_pdf)\n",
|
||||
"print(f\"Inserted {len(inserted_ids_from_pdf)} documents.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5010a66c-4298-4e32-82b5-2da0d36a5c70",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = vstore.as_retriever(search_kwargs={'k': 3})\n",
|
||||
"\n",
|
||||
"philo_template = \"\"\"\n",
|
||||
"You are a philosopher that draws inspiration from great thinkers of the past\n",
|
||||
"to craft well-thought answers to user questions. Use the provided context as the basis\n",
|
||||
"for your answers and do not make up new reasoning paths - just mix-and-match what you are given.\n",
|
||||
"Your answers must be concise and to the point, and refrain from answering about other topics than philosophy.\n",
|
||||
"\n",
|
||||
"CONTEXT:\n",
|
||||
"{context}\n",
|
||||
"\n",
|
||||
"QUESTION: {question}\n",
|
||||
"\n",
|
||||
"YOUR ANSWER:\"\"\"\n",
|
||||
"\n",
|
||||
"philo_prompt = ChatPromptTemplate.from_template(philo_template)\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI()\n",
|
||||
"\n",
|
||||
"chain = (\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
|
||||
" | philo_prompt \n",
|
||||
" | llm \n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fcbc1296-6c7c-478b-b55b-533ba4e54ddb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain.invoke(\"How does Russel elaborate on Peirce's idea of the security blanket?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "869ab448-a029-4692-aefc-26b85513314d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For more, check out a complete RAG template using Astra DB [here](https://github.com/langchain-ai/langchain/tree/master/templates/rag-astradb)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "177610c7-50d0-4b7b-8634-b03338054c8e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Cleanup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0da4d19f-9878-4d3d-82c9-09cafca20322",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to completely delete the collection from your Astra DB instance, run this.\n",
|
||||
"\n",
|
||||
"_(You will lose the data you stored in it.)_"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fd405a13-6f71-46fa-87e6-167238e9c25e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vstore.delete_collection()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "94ebaab1-7cbf-4144-a147-7b0e32c43069",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Apache Cassandra and Astra DB through CQL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bc3931b4-211d-4f84-bcc0-51c127e3027c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"[Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).\n",
|
||||
"\n",
|
||||
"DataStax [Astra DB through CQL](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a0055fbf-448d-4e46-9c40-28d43df25ca3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### What sets this case apart from \"Astra DB\" above?\n",
|
||||
"\n",
|
||||
"Thanks to LangChain having a standardized `VectorStore` interface, most of the \"Astra DB\" section above applies to this case as well. However, this time the database uses the CQL protocol, which means you'll use a _different_ class this time and instantiate it in another way.\n",
|
||||
"\n",
|
||||
"The cells below show how you should get your `vstore` object in this case and how you can clean up the database resources at the end: for the rest, i.e. the actual usage of the vector store, you will be able to run the very code that was shown above.\n",
|
||||
"\n",
|
||||
"In other words, running this demo in full with Cassandra or Astra DB through CQL means:\n",
|
||||
"\n",
|
||||
"- **initialization as shown below**\n",
|
||||
"- \"Load a dataset\", _see above section_\n",
|
||||
"- \"Run simple searches\", _see above section_\n",
|
||||
"- \"MMR search\", _see above section_\n",
|
||||
"- \"Deleting stored documents\", _see above section_\n",
|
||||
"- \"A minimal RAG chain\", _see above section_\n",
|
||||
"- **cleanup as shown below**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "23d12be2-745f-4e72-a82c-334a887bc7cd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Initialization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3212542-79be-423e-8e1f-b8d725e3cda8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The class to use is the following:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "941af73e-a090-4fba-b23c-595757d470eb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Cassandra"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "414d1e72-f7c9-4b6d-bf6f-16075712c7e3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when creating the vector store object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "48ecca56-71a4-4a91-b198-29384c44ce27",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Initialization (Cassandra cluster)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "55ebe958-5654-43e0-9aed-d607ffd3fa48",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this case, you first need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4642dafb-a065-4063-b58c-3d276f5ad07e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from cassandra.cluster import Cluster\n",
|
||||
"\n",
|
||||
"cluster = Cluster([\"127.0.0.1\"])\n",
|
||||
"session = cluster.connect()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "624c93bf-fb46-4350-bcfa-09ca09dc068f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can now set the session, along with your desired keyspace name, as a global CassIO parameter:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "92a4ab28-1c4f-4dad-9671-d47e0b1dde7b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import cassio\n",
|
||||
"\n",
|
||||
"CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")\n",
|
||||
"\n",
|
||||
"cassio.init(session=session, keyspace=CASSANDRA_KEYSPACE)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b87a824-36f1-45b4-b54c-efec2a2de216",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can create the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "853a2a88-a565-4e24-8789-d78c213954a6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vstore = Cassandra(\n",
|
||||
" embedding=embe,\n",
|
||||
" table_name=\"cassandra_vector_demo\",\n",
|
||||
" # session=None, keyspace=None # Uncomment on older versions of LangChain\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "768ddf7a-0c3e-4134-ad38-25ac53c3da7a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Initialization (Astra DB through CQL)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4ed4269a-b7e7-4503-9e66-5a11335c7681",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this case you initialize CassIO with the following connection parameters:\n",
|
||||
"\n",
|
||||
"- the Database ID, e.g. `01234567-89ab-cdef-0123-456789abcdef`\n",
|
||||
"- the Token, e.g. `AstraCS:6gBhNmsk135....` (it must be a \"Database Administrator\" token)\n",
|
||||
"- Optionally a Keyspace name (if omitted, the default one for the database will be used)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5fa6bd74-d4b2-45c5-9757-96dddc6242fb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ASTRA_DB_ID = input(\"ASTRA_DB_ID = \")\n",
|
||||
"ASTRA_DB_TOKEN = getpass(\"ASTRA_DB_TOKEN = \")\n",
|
||||
"\n",
|
||||
"desired_keyspace = input(\"ASTRA_DB_KEYSPACE (optional, can be left empty) = \")\n",
|
||||
"if desired_keyspace:\n",
|
||||
" ASTRA_DB_KEYSPACE = desired_keyspace\n",
|
||||
"else:\n",
|
||||
" ASTRA_DB_KEYSPACE = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "add6e585-17ff-452e-8ef6-7e485ead0b06",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import cassio\n",
|
||||
"\n",
|
||||
"cassio.init(\n",
|
||||
" database_id=ASTRA_DB_ID,\n",
|
||||
" token=ASTRA_DB_TOKEN,\n",
|
||||
" keyspace=ASTRA_DB_KEYSPACE,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b305823c-bc98-4f3d-aabb-d7eb663ea421",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can create the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f45f3038-9d59-41cc-8b43-774c6aa80295",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vstore = Cassandra(\n",
|
||||
" embedding=embe,\n",
|
||||
" table_name=\"cassandra_vector_demo\",\n",
|
||||
" # session=None, keyspace=None # Uncomment on older versions of LangChain\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39284918-cf8a-49bb-a2d3-aef285bb2ffa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Usage of the vector store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3cc1aead-d6ec-48a3-affe-1d0cffa955a9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_See the sections \"Load a dataset\" through \"A minimal RAG chain\" above._\n",
|
||||
"\n",
|
||||
"Speaking of the latter, you can check out a full RAG template for Astra DB through CQL [here](https://github.com/langchain-ai/langchain/tree/master/templates/cassandra-entomology-rag)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "096397d8-6622-4685-9f9d-7e238beca467",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Cleanup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cc1e74f9-5500-41aa-836f-235b1ed5f20c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"the following essentially retrieves the `Session` object from CassIO and runs a CQL `DROP TABLE` statement with it:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b5b82c33-0e77-4a37-852c-8d50edbdd991",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"cassio.config.resolve_session().execute(\n",
|
||||
" f\"DROP TABLE {cassio.config.resolve_keyspace()}.cassandra_vector_demo;\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c10ece4d-ae06-42ab-baf4-4d0ac2051743",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Learn more"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "51ea8b69-7e15-458f-85aa-9fa199f95f9c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For more information, extended quickstarts and additional usage examples, please visit the [CassIO documentation](https://cassio.org/frameworks/langchain/about/) for more on using the LangChain `Cassandra` vector store."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@@ -1,326 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Cassandra\n",
|
||||
"\n",
|
||||
">[Apache Cassandra®](https://cassandra.apache.org) is a NoSQL, row-oriented, highly scalable and highly available database.\n",
|
||||
"\n",
|
||||
"Newest Cassandra releases natively [support](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor(ANN)+Vector+Search+via+Storage-Attached+Indexes) Vector Similarity Search.\n",
|
||||
"\n",
|
||||
"To run this notebook you need either a running Cassandra cluster equipped with Vector Search capabilities (in pre-release at the time of writing) or a DataStax Astra DB instance running in the cloud (you can get one for free at [datastax.com](https://astra.datastax.com)). Check [cassio.org](https://cassio.org/start_here/) for more information."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b4c41cad-08ef-4f72-a545-2151e4598efe",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install \"cassio>=0.1.0\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b7e46bb0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Please provide database connection parameters and secrets:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "36128a32",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"database_mode = (input(\"\\n(C)assandra or (A)stra DB? \")).upper()\n",
|
||||
"\n",
|
||||
"keyspace_name = input(\"\\nKeyspace name? \")\n",
|
||||
"\n",
|
||||
"if database_mode == \"A\":\n",
|
||||
" ASTRA_DB_APPLICATION_TOKEN = getpass.getpass('\\nAstra DB Token (\"AstraCS:...\") ')\n",
|
||||
" #\n",
|
||||
" ASTRA_DB_SECURE_BUNDLE_PATH = input(\"Full path to your Secure Connect Bundle? \")\n",
|
||||
"elif database_mode == \"C\":\n",
|
||||
" CASSANDRA_CONTACT_POINTS = input(\n",
|
||||
" \"Contact points? (comma-separated, empty for localhost) \"\n",
|
||||
" ).strip()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4f22aac2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### depending on whether local or cloud-based Astra DB, create the corresponding database connection \"Session\" object"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "677f8576",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from cassandra.cluster import Cluster\n",
|
||||
"from cassandra.auth import PlainTextAuthProvider\n",
|
||||
"\n",
|
||||
"if database_mode == \"C\":\n",
|
||||
" if CASSANDRA_CONTACT_POINTS:\n",
|
||||
" cluster = Cluster(\n",
|
||||
" [cp.strip() for cp in CASSANDRA_CONTACT_POINTS.split(\",\") if cp.strip()]\n",
|
||||
" )\n",
|
||||
" else:\n",
|
||||
" cluster = Cluster()\n",
|
||||
" session = cluster.connect()\n",
|
||||
"elif database_mode == \"A\":\n",
|
||||
" ASTRA_DB_CLIENT_ID = \"token\"\n",
|
||||
" cluster = Cluster(\n",
|
||||
" cloud={\n",
|
||||
" \"secure_connect_bundle\": ASTRA_DB_SECURE_BUNDLE_PATH,\n",
|
||||
" },\n",
|
||||
" auth_provider=PlainTextAuthProvider(\n",
|
||||
" ASTRA_DB_CLIENT_ID,\n",
|
||||
" ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
" ),\n",
|
||||
" )\n",
|
||||
" session = cluster.connect()\n",
|
||||
"else:\n",
|
||||
" raise NotImplementedError"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "320af802-9271-46ee-948f-d2453933d44b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Please provide OpenAI access key\n",
|
||||
"\n",
|
||||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ffea66e4-bc23-46a9-9580-b348dfe7b7a7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e98a139b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Creation and usage of the Vector Store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Cassandra\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"SOURCE_FILE_NAME = \"../../modules/state_of_the_union.txt\"\n",
|
||||
"\n",
|
||||
"loader = TextLoader(SOURCE_FILE_NAME)\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embedding_function = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6e104aee",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"table_name = \"my_vector_db_table\"\n",
|
||||
"\n",
|
||||
"docsearch = Cassandra.from_documents(\n",
|
||||
" documents=docs,\n",
|
||||
" embedding=embedding_function,\n",
|
||||
" session=session,\n",
|
||||
" keyspace=keyspace_name,\n",
|
||||
" table_name=table_name,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f509ee02",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## if you already have an index, you can load it and use it like this:\n",
|
||||
"\n",
|
||||
"# docsearch_preexisting = Cassandra(\n",
|
||||
"# embedding=embedding_function,\n",
|
||||
"# session=session,\n",
|
||||
"# keyspace=keyspace_name,\n",
|
||||
"# table_name=table_name,\n",
|
||||
"# )\n",
|
||||
"\n",
|
||||
"# docs = docsearch_preexisting.similarity_search(query, k=2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9c608226",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d46d1452",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Maximal Marginal Relevance Searches\n",
|
||||
"\n",
|
||||
"In addition to using similarity search in the retriever object, you can also use `mmr` as retriever.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a359ed74",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
|
||||
"matched_docs = retriever.get_relevant_documents(query)\n",
|
||||
"for i, d in enumerate(matched_docs):\n",
|
||||
" print(f\"\\n## Document {i}\\n\")\n",
|
||||
" print(d.page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7c477287",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Or use `max_marginal_relevance_search` directly:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9ca82740",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
|
||||
"for i, doc in enumerate(found_docs):\n",
|
||||
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "da791c5f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Metadata filtering\n",
|
||||
"\n",
|
||||
"You can specify filtering on metadata when running searches in the vector store. By default, when inserting documents, the only metadata is the `\"source\"` (but you can customize the metadata at insertion time).\n",
|
||||
"\n",
|
||||
"Since only one files was inserted, this is just a demonstration of how filters are passed:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "93f132fa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"filter = {\"source\": SOURCE_FILE_NAME}\n",
|
||||
"filtered_docs = docsearch.similarity_search(query, filter=filter, k=5)\n",
|
||||
"print(f\"{len(filtered_docs)} documents retrieved.\")\n",
|
||||
"print(f\"{filtered_docs[0].page_content[:64]} ...\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1b413ec4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"filter = {\"source\": \"nonexisting_file.txt\"}\n",
|
||||
"filtered_docs2 = docsearch.similarity_search(query, filter=filter)\n",
|
||||
"print(f\"{len(filtered_docs2)} documents retrieved.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a0fea764",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Please visit the [cassIO documentation](https://cassio.org/frameworks/langchain/about/) for more on using vector stores with Langchain."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@@ -58,9 +58,9 @@
|
||||
"1. Do not use with a store that has been pre-populated with content independently of the indexing API, as the record manager will not know that records have been inserted previously.\n",
|
||||
"2. Only works with LangChain `vectorstore`'s that support:\n",
|
||||
" * document addition by id (`add_documents` method with `ids` argument)\n",
|
||||
" * delete by id (`delete` method with)\n",
|
||||
" * delete by id (`delete` method with `ids` argument)\n",
|
||||
"\n",
|
||||
"Compatible Vectorstores: `AnalyticDB`, `AwaDB`, `Bagel`, `Cassandra`, `Chroma`, `DashVector`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `MyScale`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `ScaNN`, `SupabaseVectorStore`, `TimescaleVector`, `Vald`, `Vearch`, `VespaStore`, `Weaviate`, `ZepVectorStore`.\n",
|
||||
"Compatible Vectorstores: `AnalyticDB`, `AstraDB`, `AwaDB`, `Bagel`, `Cassandra`, `Chroma`, `DashVector`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `MyScale`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `ScaNN`, `SupabaseVectorStore`, `TimescaleVector`, `Vald`, `Vearch`, `VespaStore`, `Weaviate`, `ZepVectorStore`.\n",
|
||||
" \n",
|
||||
"## Caution\n",
|
||||
"\n",
|
||||
|
@@ -414,7 +414,15 @@
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/cassandra",
|
||||
"destination": "/docs/integrations/providers/cassandra"
|
||||
"destination": "/docs/integrations/providers/astradb"
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/providers/cassandra",
|
||||
"destination": "/docs/integrations/providers/astradb"
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/vectorstores/cassandra",
|
||||
"destination": "/docs/integrations/vectorstores/astradb"
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/cerebriumai",
|
||||
|
Reference in New Issue
Block a user