mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-10 07:21:03 +00:00
astradb: bootstrapping Astra DB as Partner Package (#16875)
**Description:** This PR introduces a new "Astra DB" Partner Package. So far only the vector store class is _duplicated_ there, all others following once this is validated and established. Along with the move to separate package, incidentally, the class name will change `AstraDB` => `AstraDBVectorStore`. The strategy has been to duplicate the module (with prospected removal from community at LangChain 0.2). Until then, the code will be kept in sync with minimal, known differences (there is a makefile target to automate drift control. Out of convenience with this check, the community package has a class `AstraDBVectorStore` aliased to `AstraDB` at the end of the module). With this PR several bugfixes and improvement come to the vector store, as well as a reshuffling of the doc pages/notebooks (Astra and Cassandra) to align with the move to a separate package. **Dependencies:** A brand new pyproject.toml in the new package, no changes otherwise. **Twitter handle:** `@rsprrs` --------- Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
@@ -72,57 +72,72 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Init from a cassandra driver Session\n",
|
||||
"\n",
|
||||
"You need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from cassandra.cluster import Cluster\n",
|
||||
"\n",
|
||||
"cluster = Cluster()\n",
|
||||
"session = cluster.connect()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"execution_count": null
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"You need to provide the name of an existing keyspace of the Cassandra instance:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"execution_count": null
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Creating the document loader:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
@@ -144,18 +159,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
],
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2024-01-19T15:47:26.399472Z",
|
||||
"start_time": "2024-01-19T15:47:26.389145Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"execution_count": 17
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
@@ -169,7 +187,9 @@
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": "Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\" the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})"
|
||||
"text/plain": [
|
||||
"Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\" the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
@@ -182,17 +202,27 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Init from cassio\n",
|
||||
"\n",
|
||||
"It's also possible to use cassio to configure the session and keyspace."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import cassio\n",
|
||||
@@ -204,11 +234,16 @@
|
||||
")\n",
|
||||
"\n",
|
||||
"docs = loader.load()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"execution_count": null
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Attribution statement\n",
|
||||
"\n",
|
||||
"> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -233,7 +268,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.18"
|
||||
"version": "3.9.17"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@@ -1131,6 +1131,16 @@
|
||||
"print(llm(\"How come we always see one face of the moon?\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "55dc84b3-37cb-4f19-b175-40e18e06f83f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Attribution statement\n",
|
||||
"\n",
|
||||
">Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8712f8fc-bb89-4164-beb9-c672778bbd91",
|
||||
@@ -1588,7 +1598,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
"version": "3.9.17"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@@ -32,7 +32,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet \"astrapy>=0.6.2\""
|
||||
"%pip install --upgrade --quiet \"astrapy>=0.7.1\""
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@@ -145,6 +145,24 @@
|
||||
"source": [
|
||||
"message_history.messages"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "59902d0f-e9ba-4e3d-a7e0-ce202b9d3c43",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Attribution statement\n",
|
||||
"\n",
|
||||
"> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7efaa51c-e9ee-4dce-80a4-eb9280a0dbe5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -163,7 +181,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
"version": "3.9.17"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@@ -1,21 +1,17 @@
|
||||
# Astra DB
|
||||
|
||||
This page lists the integrations available with [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) and [Apache Cassandra®](https://cassandra.apache.org/).
|
||||
> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available
|
||||
> through an easy-to-use JSON API.
|
||||
|
||||
### Setup
|
||||
|
||||
Install the following Python package:
|
||||
|
||||
```bash
|
||||
pip install "astrapy>=0.5.3"
|
||||
pip install "astrapy>=0.7.1"
|
||||
```
|
||||
|
||||
## Astra DB
|
||||
|
||||
> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available
|
||||
> through an easy-to-use JSON API.
|
||||
|
||||
### Vector Store
|
||||
## Vector Store
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import AstraDB
|
||||
@@ -29,11 +25,22 @@ vector_store = AstraDB(
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/vectorstores/astradb).
|
||||
|
||||
### LLM Cache
|
||||
## Chat message history
|
||||
|
||||
```python
|
||||
from langchain_community.chat_message_histories import AstraDBChatMessageHistory
|
||||
message_history = AstraDBChatMessageHistory(
|
||||
session_id="test-session",
|
||||
api_endpoint="...",
|
||||
token="...",
|
||||
)
|
||||
```
|
||||
|
||||
## LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.globals import set_llm_cache
|
||||
from langchain.cache import AstraDBCache
|
||||
from langchain_community.cache import AstraDBCache
|
||||
set_llm_cache(AstraDBCache(
|
||||
api_endpoint="...",
|
||||
token="...",
|
||||
@@ -43,11 +50,11 @@ set_llm_cache(AstraDBCache(
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching#astra-db-caches) (scroll to the Astra DB section).
|
||||
|
||||
|
||||
### Semantic LLM Cache
|
||||
## Semantic LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.globals import set_llm_cache
|
||||
from langchain.cache import AstraDBSemanticCache
|
||||
from langchain_community.cache import AstraDBSemanticCache
|
||||
set_llm_cache(AstraDBSemanticCache(
|
||||
embedding=my_embedding,
|
||||
api_endpoint="...",
|
||||
@@ -57,20 +64,9 @@ set_llm_cache(AstraDBSemanticCache(
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching#astra-db-caches) (scroll to the appropriate section).
|
||||
|
||||
### Chat message history
|
||||
|
||||
```python
|
||||
from langchain.memory import AstraDBChatMessageHistory
|
||||
message_history = AstraDBChatMessageHistory(
|
||||
session_id="test-session",
|
||||
api_endpoint="...",
|
||||
token="...",
|
||||
)
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/memory/astradb_chat_message_history).
|
||||
|
||||
### Document loader
|
||||
## Document loader
|
||||
|
||||
```python
|
||||
from langchain_community.document_loaders import AstraDBLoader
|
||||
@@ -83,7 +79,7 @@ loader = AstraDBLoader(
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/document_loaders/astradb).
|
||||
|
||||
### Self-querying retriever
|
||||
## Self-querying retriever
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import AstraDB
|
||||
@@ -106,7 +102,7 @@ retriever = SelfQueryRetriever.from_llm(
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/retrievers/self_query/astradb).
|
||||
|
||||
### Store
|
||||
## Store
|
||||
|
||||
```python
|
||||
from langchain_community.storage import AstraDBStore
|
||||
@@ -119,7 +115,7 @@ store = AstraDBStore(
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/stores/astradb#astradbstore).
|
||||
|
||||
### Byte Store
|
||||
## Byte Store
|
||||
|
||||
```python
|
||||
from langchain_community.storage import AstraDBByteStore
|
||||
@@ -131,57 +127,3 @@ store = AstraDBByteStore(
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/stores/astradb#astradbbytestore).
|
||||
|
||||
## Apache Cassandra and Astra DB through CQL
|
||||
|
||||
> [Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.
|
||||
> Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).
|
||||
> DataStax [Astra DB through CQL](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths.
|
||||
|
||||
These databases use the CQL protocol (Cassandra Query Language).
|
||||
Hence, a different set of connectors, outlined below, shall be used.
|
||||
|
||||
### Vector Store
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import Cassandra
|
||||
vector_store = Cassandra(
|
||||
embedding=my_embedding,
|
||||
table_name="my_store",
|
||||
)
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/vectorstores/astradb#apache-cassandra-and-astra-db-through-cql) (scroll down to the CQL-specific section).
|
||||
|
||||
|
||||
### Memory
|
||||
|
||||
```python
|
||||
from langchain.memory import CassandraChatMessageHistory
|
||||
message_history = CassandraChatMessageHistory(session_id="my-session")
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/memory/cassandra_chat_message_history).
|
||||
|
||||
|
||||
### LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.cache import CassandraCache
|
||||
langchain.llm_cache = CassandraCache()
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the Cassandra section).
|
||||
|
||||
|
||||
### Semantic LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.cache import CassandraSemanticCache
|
||||
cassSemanticCache = CassandraSemanticCache(
|
||||
embedding=my_embedding,
|
||||
table_name="my_store",
|
||||
)
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the appropriate section).
|
||||
|
76
docs/docs/integrations/providers/cassandra.mdx
Normal file
76
docs/docs/integrations/providers/cassandra.mdx
Normal file
@@ -0,0 +1,76 @@
|
||||
# Apache Cassandra
|
||||
|
||||
> [Apache Cassandra®](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.
|
||||
> Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).
|
||||
|
||||
The integrations outlined in this page can be used with Cassandra as well as other CQL-compatible databases, i.e. those using the Cassandra Query Language protocol.
|
||||
|
||||
|
||||
### Setup
|
||||
|
||||
Install the following Python package:
|
||||
|
||||
```bash
|
||||
pip install "cassio>=0.1.4"
|
||||
```
|
||||
|
||||
|
||||
## Vector Store
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import Cassandra
|
||||
vector_store = Cassandra(
|
||||
embedding=my_embedding,
|
||||
table_name="my_store",
|
||||
)
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/vectorstores/cassandra).
|
||||
|
||||
## Chat message history
|
||||
|
||||
```python
|
||||
from langchain_community.chat_message_histories import CassandraChatMessageHistory
|
||||
message_history = CassandraChatMessageHistory(session_id="my-session")
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/memory/cassandra_chat_message_history).
|
||||
|
||||
|
||||
## LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.globals import set_llm_cache
|
||||
from langchain_community.cache import CassandraCache
|
||||
set_llm_cache(CassandraCache())
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the Cassandra section).
|
||||
|
||||
|
||||
## Semantic LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.globals import set_llm_cache
|
||||
from langchain_community.cache import CassandraSemanticCache
|
||||
set_llm_cache(CassandraSemanticCache(
|
||||
embedding=my_embedding,
|
||||
table_name="my_store",
|
||||
))
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the appropriate section).
|
||||
|
||||
## Document loader
|
||||
|
||||
```python
|
||||
from langchain_community.document_loaders import CassandraLoader
|
||||
loader = CassandraLoader(table="my_table")
|
||||
docs = loader.load()
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/document_loaders/cassandra).
|
||||
|
||||
#### Attribution statement
|
||||
|
||||
> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries.
|
@@ -1,14 +1,28 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "66d0270a-b74f-4110-901e-7960b00297af",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Astra DB\n",
|
||||
"\n",
|
||||
"This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) as a Vector Store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ab8cd64f-3bb2-4f16-a0a9-12d7b1789bf6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available through an easy-to-use JSON API."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d2d6ca14-fb7e-4172-9aa0-a3119a064b96",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Astra DB\n",
|
||||
"\n",
|
||||
"This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) and [Apache Cassandra®](https://cassandra.apache.org/) as a Vector Store.\n",
|
||||
"\n",
|
||||
"_Note: in addition to access to the database, an OpenAI API Key is required to run the full example._"
|
||||
]
|
||||
},
|
||||
@@ -35,7 +49,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet \"astrapy>=0.5.3\""
|
||||
"%pip install --upgrade --quiet \"astrapy>=0.7.1\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -44,7 +58,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_Note: depending on your LangChain setup, you may need to install/upgrade other dependencies needed for this demo_\n",
|
||||
"_(specifically, recent versions of `datasets`, `openai`, `pypdf` and `tiktoken` are required)._"
|
||||
"_(specifically, recent versions of `datasets`, `langchain-openai` and `pypdf` are required, along with `langchain-community`)._"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -89,28 +103,12 @@
|
||||
"embe = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dd8caa76-bc41-429e-a93b-989ba13aff01",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_Keep reading to connect with Astra DB. For usage with Apache Cassandra and Astra DB through CQL, scroll to the section below._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "22866f09-e10d-4f05-a24b-b9420129462e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Astra DB"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5fba47cc-3533-42fc-84b7-9dc14cd68b2b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API."
|
||||
"## Import the Vector Store"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -128,10 +126,13 @@
|
||||
"id": "68f61b01-3e09-47c1-9d67-5d6915c86626",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Astra DB connection parameters\n",
|
||||
"## Connection parameters\n",
|
||||
"\n",
|
||||
"These are found on your Astra DB dashboard:\n",
|
||||
"\n",
|
||||
"- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
|
||||
"- the Token looks like `AstraCS:6gBhNmsk135....`"
|
||||
"- the Token looks like `AstraCS:6gBhNmsk135....`\n",
|
||||
"- you may optionally provide a _Namespace_ such as `my_namespace`"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -142,7 +143,21 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n",
|
||||
"ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")"
|
||||
"ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
|
||||
"\n",
|
||||
"desired_namespace = input(\"(optional) Namespace = \")\n",
|
||||
"if desired_namespace:\n",
|
||||
" ASTRA_DB_KEYSPACE = desired_namespace\n",
|
||||
"else:\n",
|
||||
" ASTRA_DB_KEYSPACE = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "196268bd-a950-41c3-bede-f5b55f6a0804",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can create the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -157,6 +172,7 @@
|
||||
" collection_name=\"astra_vector_demo\",\n",
|
||||
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
|
||||
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
" namespace=ASTRA_DB_KEYSPACE,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -165,7 +181,7 @@
|
||||
"id": "9a348678-b2f6-46ca-9a0d-2eb4cc6b66b1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load a dataset"
|
||||
"## Load a dataset"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -243,7 +259,7 @@
|
||||
"id": "c031760a-1fc5-4855-adf2-02ed52fe2181",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run simple searches"
|
||||
"## Run searches"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -318,12 +334,22 @@
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "60fda5df-14e4-4fb0-bd17-65a393fab8a9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Async\n",
|
||||
"\n",
|
||||
"Note that the Astra DB vector store supports all fully async methods (`asimilarity_search`, `afrom_texts`, `adelete` and so on) natively, i.e. without thread wrapping involved."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1cc86edd-692b-4495-906c-ccfd13b03c23",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deleting stored documents"
|
||||
"## Deleting stored documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -353,7 +379,7 @@
|
||||
"id": "847181ba-77d1-4a17-b7f9-9e2c3d8efd13",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### A minimal RAG chain"
|
||||
"## A minimal RAG chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -452,7 +478,7 @@
|
||||
"id": "177610c7-50d0-4b7b-8634-b03338054c8e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Cleanup"
|
||||
"## Cleanup"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -474,290 +500,6 @@
|
||||
"source": [
|
||||
"vstore.delete_collection()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "94ebaab1-7cbf-4144-a147-7b0e32c43069",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Apache Cassandra and Astra DB through CQL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bc3931b4-211d-4f84-bcc0-51c127e3027c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"[Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).\n",
|
||||
"\n",
|
||||
"DataStax [Astra DB through CQL](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a0055fbf-448d-4e46-9c40-28d43df25ca3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### What sets this case apart from \"Astra DB\" above?\n",
|
||||
"\n",
|
||||
"Thanks to LangChain having a standardized `VectorStore` interface, most of the \"Astra DB\" section above applies to this case as well. However, this time the database uses the CQL protocol, which means you'll use a _different_ class this time and instantiate it in another way.\n",
|
||||
"\n",
|
||||
"The cells below show how you should get your `vstore` object in this case and how you can clean up the database resources at the end: for the rest, i.e. the actual usage of the vector store, you will be able to run the very code that was shown above.\n",
|
||||
"\n",
|
||||
"In other words, running this demo in full with Cassandra or Astra DB through CQL means:\n",
|
||||
"\n",
|
||||
"- **initialization as shown below**\n",
|
||||
"- \"Load a dataset\", _see above section_\n",
|
||||
"- \"Run simple searches\", _see above section_\n",
|
||||
"- \"MMR search\", _see above section_\n",
|
||||
"- \"Deleting stored documents\", _see above section_\n",
|
||||
"- \"A minimal RAG chain\", _see above section_\n",
|
||||
"- **cleanup as shown below**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "23d12be2-745f-4e72-a82c-334a887bc7cd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Initialization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3212542-79be-423e-8e1f-b8d725e3cda8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The class to use is the following:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "941af73e-a090-4fba-b23c-595757d470eb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.vectorstores import Cassandra"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "414d1e72-f7c9-4b6d-bf6f-16075712c7e3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when creating the vector store object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "48ecca56-71a4-4a91-b198-29384c44ce27",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Initialization (Cassandra cluster)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "55ebe958-5654-43e0-9aed-d607ffd3fa48",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this case, you first need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4642dafb-a065-4063-b58c-3d276f5ad07e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from cassandra.cluster import Cluster\n",
|
||||
"\n",
|
||||
"cluster = Cluster([\"127.0.0.1\"])\n",
|
||||
"session = cluster.connect()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "624c93bf-fb46-4350-bcfa-09ca09dc068f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can now set the session, along with your desired keyspace name, as a global CassIO parameter:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "92a4ab28-1c4f-4dad-9671-d47e0b1dde7b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import cassio\n",
|
||||
"\n",
|
||||
"CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")\n",
|
||||
"\n",
|
||||
"cassio.init(session=session, keyspace=CASSANDRA_KEYSPACE)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b87a824-36f1-45b4-b54c-efec2a2de216",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can create the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "853a2a88-a565-4e24-8789-d78c213954a6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vstore = Cassandra(\n",
|
||||
" embedding=embe,\n",
|
||||
" table_name=\"cassandra_vector_demo\",\n",
|
||||
" # session=None, keyspace=None # Uncomment on older versions of LangChain\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "768ddf7a-0c3e-4134-ad38-25ac53c3da7a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Initialization (Astra DB through CQL)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4ed4269a-b7e7-4503-9e66-5a11335c7681",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this case you initialize CassIO with the following connection parameters:\n",
|
||||
"\n",
|
||||
"- the Database ID, e.g. `01234567-89ab-cdef-0123-456789abcdef`\n",
|
||||
"- the Token, e.g. `AstraCS:6gBhNmsk135....` (it must be a \"Database Administrator\" token)\n",
|
||||
"- Optionally a Keyspace name (if omitted, the default one for the database will be used)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5fa6bd74-d4b2-45c5-9757-96dddc6242fb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ASTRA_DB_ID = input(\"ASTRA_DB_ID = \")\n",
|
||||
"ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
|
||||
"\n",
|
||||
"desired_keyspace = input(\"ASTRA_DB_KEYSPACE (optional, can be left empty) = \")\n",
|
||||
"if desired_keyspace:\n",
|
||||
" ASTRA_DB_KEYSPACE = desired_keyspace\n",
|
||||
"else:\n",
|
||||
" ASTRA_DB_KEYSPACE = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "add6e585-17ff-452e-8ef6-7e485ead0b06",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import cassio\n",
|
||||
"\n",
|
||||
"cassio.init(\n",
|
||||
" database_id=ASTRA_DB_ID,\n",
|
||||
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
" keyspace=ASTRA_DB_KEYSPACE,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b305823c-bc98-4f3d-aabb-d7eb663ea421",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can create the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f45f3038-9d59-41cc-8b43-774c6aa80295",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vstore = Cassandra(\n",
|
||||
" embedding=embe,\n",
|
||||
" table_name=\"cassandra_vector_demo\",\n",
|
||||
" # session=None, keyspace=None # Uncomment on older versions of LangChain\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39284918-cf8a-49bb-a2d3-aef285bb2ffa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Usage of the vector store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3cc1aead-d6ec-48a3-affe-1d0cffa955a9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_See the sections \"Load a dataset\" through \"A minimal RAG chain\" above._\n",
|
||||
"\n",
|
||||
"Speaking of the latter, you can check out a full RAG template for Astra DB through CQL [here](https://github.com/langchain-ai/langchain/tree/master/templates/cassandra-entomology-rag)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "096397d8-6622-4685-9f9d-7e238beca467",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Cleanup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cc1e74f9-5500-41aa-836f-235b1ed5f20c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"the following essentially retrieves the `Session` object from CassIO and runs a CQL `DROP TABLE` statement with it:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b5b82c33-0e77-4a37-852c-8d50edbdd991",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"cassio.config.resolve_session().execute(\n",
|
||||
" f\"DROP TABLE {cassio.config.resolve_keyspace()}.cassandra_vector_demo;\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c10ece4d-ae06-42ab-baf4-4d0ac2051743",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Learn more"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "51ea8b69-7e15-458f-85aa-9fa199f95f9c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For more information, extended quickstarts and additional usage examples, please visit the [CassIO documentation](https://cassio.org/frameworks/langchain/about/) for more on using the LangChain `Cassandra` vector store."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -776,7 +518,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
"version": "3.9.18"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
651
docs/docs/integrations/vectorstores/cassandra.ipynb
Normal file
651
docs/docs/integrations/vectorstores/cassandra.ipynb
Normal file
@@ -0,0 +1,651 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d2d6ca14-fb7e-4172-9aa0-a3119a064b96",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Apache Cassandra\n",
|
||||
"\n",
|
||||
"This page provides a quickstart for using [Apache Cassandra®](https://cassandra.apache.org/) as a Vector Store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6a1a562e-3d1a-4693-b55d-08bf90943a9a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> [Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9cf37d7f-c18e-4e63-adea-138e5e981475",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_Note: in addition to access to the database, an OpenAI API Key is required to run the full example._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bb9be7ce-8c70-4d46-9f11-71c42a36e928",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Setup and general dependencies"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dbe7c156-0413-47e3-9237-4769c4248869",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Use of the integration requires the following Python package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8d00fcf4-9798-4289-9214-d9734690adfc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet \"cassio>=0.1.4\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2453d83a-bc8f-41e1-a692-befe4dd90156",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_Note: depending on your LangChain setup, you may need to install/upgrade other dependencies needed for this demo_\n",
|
||||
"_(specifically, recent versions of `datasets`, `openai`, `pypdf` and `tiktoken` are required, along with `langchain-community`)._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b06619af-fea2-4863-8149-7f239a8c9c82",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"from datasets import (\n",
|
||||
" load_dataset,\n",
|
||||
")\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain_community.document_loaders import PyPDFLoader\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1983f1da-0ae7-4a9b-bf4c-4ade328f7a3a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass(\"OPENAI_API_KEY = \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c656df06-e938-4bc5-b570-440b8b7a0189",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embe = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "22866f09-e10d-4f05-a24b-b9420129462e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Import the Vector Store"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0b32730d-176e-414c-9d91-fd3644c54211",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.vectorstores import Cassandra"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68f61b01-3e09-47c1-9d67-5d6915c86626",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Connection parameters\n",
|
||||
"\n",
|
||||
"The Vector Store integration shown in this page can be used with Cassandra as well as other derived databases, such as Astra DB, which use the CQL (Cassandra Query Language) protocol.\n",
|
||||
"\n",
|
||||
"> DataStax [Astra DB](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths.\n",
|
||||
"\n",
|
||||
"Depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when creating the vector store object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "36bbb3d9-4d07-4f63-b23d-c52be03f8938",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connecting to a Cassandra cluster\n",
|
||||
"\n",
|
||||
"You first need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d95bb1d4-d8a6-4e66-89bc-776f9c6f962b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from cassandra.cluster import Cluster\n",
|
||||
"\n",
|
||||
"cluster = Cluster([\"127.0.0.1\"])\n",
|
||||
"session = cluster.connect()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8279aa78-96d6-43ad-aa21-79fd798d895d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can now set the session, along with your desired keyspace name, as a global CassIO parameter:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "29ececc4-e50b-4428-967f-4b6bbde12a14",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import cassio\n",
|
||||
"\n",
|
||||
"CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")\n",
|
||||
"\n",
|
||||
"cassio.init(session=session, keyspace=CASSANDRA_KEYSPACE)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0bd035a2-f0af-418f-94e5-0fbb4d51ac3c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can create the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eeb62cde-89fc-44d7-ba76-91e19cbc5898",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vstore = Cassandra(\n",
|
||||
" embedding=embe,\n",
|
||||
" table_name=\"cassandra_vector_demo\",\n",
|
||||
" # session=None, keyspace=None # Uncomment on older versions of LangChain\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ce240555-e5fc-431d-ac0f-bcf2f6e6a5fb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_Note: you can also pass your session and keyspace directly as parameters when creating the vector store. Using the global `cassio.init` setting, however, comes handy if your applications uses Cassandra in several ways (for instance, for vector store, chat memory and LLM response caching), as it allows to centralize credential and DB connection management in one place._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b598e5fa-eb62-4939-9734-091628e84db4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connecting to Astra DB through CQL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2feec7c3-7092-4252-9a3f-05eda4babe74",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this case you initialize CassIO with the following connection parameters:\n",
|
||||
"\n",
|
||||
"- the Database ID, e.g. `01234567-89ab-cdef-0123-456789abcdef`\n",
|
||||
"- the Token, e.g. `AstraCS:6gBhNmsk135....` (it must be a \"Database Administrator\" token)\n",
|
||||
"- Optionally a Keyspace name (if omitted, the default one for the database will be used)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2f96147d-6d76-4101-bbb0-4a7f215c3d2d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ASTRA_DB_ID = input(\"ASTRA_DB_ID = \")\n",
|
||||
"ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
|
||||
"\n",
|
||||
"desired_keyspace = input(\"ASTRA_DB_KEYSPACE (optional, can be left empty) = \")\n",
|
||||
"if desired_keyspace:\n",
|
||||
" ASTRA_DB_KEYSPACE = desired_keyspace\n",
|
||||
"else:\n",
|
||||
" ASTRA_DB_KEYSPACE = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d653df1d-9dad-4980-ba52-76a47b4c5c1a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import cassio\n",
|
||||
"\n",
|
||||
"cassio.init(\n",
|
||||
" database_id=ASTRA_DB_ID,\n",
|
||||
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
" keyspace=ASTRA_DB_KEYSPACE,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e606b58b-d390-4fed-a2fc-65036c44860f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can create the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9cb552d1-e888-4550-a350-6df06b1f5aae",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vstore = Cassandra(\n",
|
||||
" embedding=embe,\n",
|
||||
" table_name=\"cassandra_vector_demo\",\n",
|
||||
" # session=None, keyspace=None # Uncomment on older versions of LangChain\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9a348678-b2f6-46ca-9a0d-2eb4cc6b66b1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load a dataset"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "552e56b0-301a-4b06-99c7-57ba6faa966f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Convert each entry in the source dataset into a `Document`, then write them into the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3a1f532f-ad63-4256-9730-a183841bd8e9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"philo_dataset = load_dataset(\"datastax/philosopher-quotes\")[\"train\"]\n",
|
||||
"\n",
|
||||
"docs = []\n",
|
||||
"for entry in philo_dataset:\n",
|
||||
" metadata = {\"author\": entry[\"author\"]}\n",
|
||||
" doc = Document(page_content=entry[\"quote\"], metadata=metadata)\n",
|
||||
" docs.append(doc)\n",
|
||||
"\n",
|
||||
"inserted_ids = vstore.add_documents(docs)\n",
|
||||
"print(f\"\\nInserted {len(inserted_ids)} documents.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "79d4f436-ef04-4288-8f79-97c9abb983ed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the above, `metadata` dictionaries are created from the source data and are part of the `Document`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "084d8802-ab39-4262-9a87-42eafb746f92",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Add some more entries, this time with `add_texts`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b6b157f5-eb31-4907-a78e-2e2b06893936",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"texts = [\"I think, therefore I am.\", \"To the things themselves!\"]\n",
|
||||
"metadatas = [{\"author\": \"descartes\"}, {\"author\": \"husserl\"}]\n",
|
||||
"ids = [\"desc_01\", \"huss_xy\"]\n",
|
||||
"\n",
|
||||
"inserted_ids_2 = vstore.add_texts(texts=texts, metadatas=metadatas, ids=ids)\n",
|
||||
"print(f\"\\nInserted {len(inserted_ids_2)} documents.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "63840eb3-8b29-4017-bc2f-301bf5001f28",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"_Note: you may want to speed up the execution of `add_texts` and `add_documents` by increasing the concurrency level for_\n",
|
||||
"_these bulk operations - check out the methods' `batch_size` parameter_\n",
|
||||
"_for more details. Depending on the network and the client machine specifications, your best-performing choice of parameters may vary._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c031760a-1fc5-4855-adf2-02ed52fe2181",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run searches"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02a77d8e-1aae-4054-8805-01c77947c49f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This section demonstrates metadata filtering and getting the similarity scores back:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1761806a-1afd-4491-867c-25a80d92b9fe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vstore.similarity_search(\"Our life is what we make of it\", k=3)\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eebc4f7c-f61a-438e-b3c8-17e6888d8a0b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results_filtered = vstore.similarity_search(\n",
|
||||
" \"Our life is what we make of it\",\n",
|
||||
" k=3,\n",
|
||||
" filter={\"author\": \"plato\"},\n",
|
||||
")\n",
|
||||
"for res in results_filtered:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "11bbfe64-c0cd-40c6-866a-a5786538450e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vstore.similarity_search_with_score(\"Our life is what we make of it\", k=3)\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b14ea558-bfbe-41ce-807e-d70670060ada",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### MMR (Maximal-marginal-relevance) search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "76381ce8-780a-4e3b-97b1-056d6782d7d5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"results = vstore.max_marginal_relevance_search(\n",
|
||||
" \"Our life is what we make of it\",\n",
|
||||
" k=3,\n",
|
||||
" filter={\"author\": \"aristotle\"},\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1cc86edd-692b-4495-906c-ccfd13b03c23",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deleting stored documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "38a70ec4-b522-4d32-9ead-c642864fca37",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"delete_1 = vstore.delete(inserted_ids[:3])\n",
|
||||
"print(f\"all_succeed={delete_1}\") # True, all documents deleted"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d4cf49ed-9d29-4ed9-bdab-51a308c41b8e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"delete_2 = vstore.delete(inserted_ids[2:5])\n",
|
||||
"print(f\"some_succeeds={delete_2}\") # True, though some IDs were gone already"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "847181ba-77d1-4a17-b7f9-9e2c3d8efd13",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## A minimal RAG chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cd64b844-846f-43c5-a7dd-c26b9ed417d0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The next cells will implement a simple RAG pipeline:\n",
|
||||
"- download a sample PDF file and load it onto the store;\n",
|
||||
"- create a RAG chain with LCEL (LangChain Expression Language), with the vector store at its heart;\n",
|
||||
"- run the question-answering chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5cbc4dba-0d5e-4038-8fc5-de6cadd1c2a9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!curl -L \\\n",
|
||||
" \"https://github.com/awesome-astra/datasets/blob/main/demo-resources/what-is-philosophy/what-is-philosophy.pdf?raw=true\" \\\n",
|
||||
" -o \"what-is-philosophy.pdf\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "459385be-5e9c-47ff-ba53-2b7ae6166b09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pdf_loader = PyPDFLoader(\"what-is-philosophy.pdf\")\n",
|
||||
"splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)\n",
|
||||
"docs_from_pdf = pdf_loader.load_and_split(text_splitter=splitter)\n",
|
||||
"\n",
|
||||
"print(f\"Documents from PDF: {len(docs_from_pdf)}.\")\n",
|
||||
"inserted_ids_from_pdf = vstore.add_documents(docs_from_pdf)\n",
|
||||
"print(f\"Inserted {len(inserted_ids_from_pdf)} documents.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5010a66c-4298-4e32-82b5-2da0d36a5c70",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = vstore.as_retriever(search_kwargs={\"k\": 3})\n",
|
||||
"\n",
|
||||
"philo_template = \"\"\"\n",
|
||||
"You are a philosopher that draws inspiration from great thinkers of the past\n",
|
||||
"to craft well-thought answers to user questions. Use the provided context as the basis\n",
|
||||
"for your answers and do not make up new reasoning paths - just mix-and-match what you are given.\n",
|
||||
"Your answers must be concise and to the point, and refrain from answering about other topics than philosophy.\n",
|
||||
"\n",
|
||||
"CONTEXT:\n",
|
||||
"{context}\n",
|
||||
"\n",
|
||||
"QUESTION: {question}\n",
|
||||
"\n",
|
||||
"YOUR ANSWER:\"\"\"\n",
|
||||
"\n",
|
||||
"philo_prompt = ChatPromptTemplate.from_template(philo_template)\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI()\n",
|
||||
"\n",
|
||||
"chain = (\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
|
||||
" | philo_prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fcbc1296-6c7c-478b-b55b-533ba4e54ddb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain.invoke(\"How does Russel elaborate on Peirce's idea of the security blanket?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "869ab448-a029-4692-aefc-26b85513314d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For more, check out a complete RAG template using Astra DB through CQL [here](https://github.com/langchain-ai/langchain/tree/master/templates/cassandra-entomology-rag)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "177610c7-50d0-4b7b-8634-b03338054c8e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Cleanup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0da4d19f-9878-4d3d-82c9-09cafca20322",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"the following essentially retrieves the `Session` object from CassIO and runs a CQL `DROP TABLE` statement with it:\n",
|
||||
"\n",
|
||||
"_(You will lose the data you stored in it.)_"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fd405a13-6f71-46fa-87e6-167238e9c25e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"cassio.config.resolve_session().execute(\n",
|
||||
" f\"DROP TABLE {cassio.config.resolve_keyspace()}.cassandra_vector_demo;\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c10ece4d-ae06-42ab-baf4-4d0ac2051743",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Learn more"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "51ea8b69-7e15-458f-85aa-9fa199f95f9c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For more information, extended quickstarts and additional usage examples, please visit the [CassIO documentation](https://cassio.org/frameworks/langchain/about/) for more on using the LangChain `Cassandra` vector store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b8ee30c-2c84-42f3-9cff-e80dbc590490",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Attribution statement\n",
|
||||
"\n",
|
||||
"> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.17"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@@ -594,11 +594,7 @@
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/cassandra",
|
||||
"destination": "/docs/integrations/providers/astradb"
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/providers/cassandra",
|
||||
"destination": "/docs/integrations/providers/astradb"
|
||||
"destination": "/docs/integrations/providers/cassandra"
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/providers/providers/semadb",
|
||||
@@ -608,10 +604,6 @@
|
||||
"source": "/docs/integrations/vectorstores/vectorstores/semadb",
|
||||
"destination": "/docs/integrations/vectorstores/semadb"
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/vectorstores/cassandra",
|
||||
"destination": "/docs/integrations/vectorstores/astradb"
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/vectorstores/async_faiss",
|
||||
"destination": "/docs/integrations/vectorstores/faiss_async"
|
||||
|
Reference in New Issue
Block a user