mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-07 07:34:02 +00:00
**Description:** This PR introduces a new "Astra DB" Partner Package. So far only the vector store class is _duplicated_ there, all others following once this is validated and established. Along with the move to separate package, incidentally, the class name will change `AstraDB` => `AstraDBVectorStore`. The strategy has been to duplicate the module (with prospected removal from community at LangChain 0.2). Until then, the code will be kept in sync with minimal, known differences (there is a makefile target to automate drift control. Out of convenience with this check, the community package has a class `AstraDBVectorStore` aliased to `AstraDB` at the end of the module). With this PR several bugfixes and improvement come to the vector store, as well as a reshuffling of the doc pages/notebooks (Astra and Cassandra) to align with the move to a separate package. **Dependencies:** A brand new pyproject.toml in the new package, no changes otherwise. **Twitter handle:** `@rsprrs` --------- Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>
277 lines
7.0 KiB
Plaintext
277 lines
7.0 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "vm8vn9t8DvC_"
|
|
},
|
|
"source": [
|
|
"# Cassandra"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html)."
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "5WjXERXzFEhg"
|
|
},
|
|
"source": [
|
|
"## Overview"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "juAmbgoWD17u"
|
|
},
|
|
"source": [
|
|
"The Cassandra Document Loader returns a list of Langchain Documents from a Cassandra database.\n",
|
|
"\n",
|
|
"You must either provide a CQL query or a table name to retrieve the documents.\n",
|
|
"The Loader takes the following parameters:\n",
|
|
"\n",
|
|
"* table: (Optional) The table to load the data from.\n",
|
|
"* session: (Optional) The cassandra driver session. If not provided, the cassio resolved session will be used.\n",
|
|
"* keyspace: (Optional) The keyspace of the table. If not provided, the cassio resolved keyspace will be used.\n",
|
|
"* query: (Optional) The query used to load the data.\n",
|
|
"* page_content_mapper: (Optional) a function to convert a row to string page content. The default converts the row to JSON.\n",
|
|
"* metadata_mapper: (Optional) a function to convert a row to metadata dict.\n",
|
|
"* query_parameters: (Optional) The query parameters used when calling session.execute .\n",
|
|
"* query_timeout: (Optional) The query timeout used when calling session.execute .\n",
|
|
"* query_custom_payload: (Optional) The query custom_payload used when calling `session.execute`.\n",
|
|
"* query_execution_profile: (Optional) The query execution_profile used when calling `session.execute`.\n",
|
|
"* query_host: (Optional) The query host used when calling `session.execute`.\n",
|
|
"* query_execute_as: (Optional) The query execute_as used when calling `session.execute`."
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Load documents with the Document Loader"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.document_loaders import CassandraLoader"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"source": [
|
|
"### Init from a cassandra driver Session\n",
|
|
"\n",
|
|
"You need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from cassandra.cluster import Cluster\n",
|
|
"\n",
|
|
"cluster = Cluster()\n",
|
|
"session = cluster.connect()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"source": [
|
|
"You need to provide the name of an existing keyspace of the Cassandra instance:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"source": [
|
|
"Creating the document loader:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-01-19T15:47:25.893037Z",
|
|
"start_time": "2024-01-19T15:47:25.889398Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = CassandraLoader(\n",
|
|
" table=\"movie_reviews\",\n",
|
|
" session=session,\n",
|
|
" keyspace=CASSANDRA_KEYSPACE,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-01-19T15:47:26.399472Z",
|
|
"start_time": "2024-01-19T15:47:26.389145Z"
|
|
},
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-01-19T15:47:33.287783Z",
|
|
"start_time": "2024-01-19T15:47:33.277862Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\" the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})"
|
|
]
|
|
},
|
|
"execution_count": 19,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"docs[0]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"source": [
|
|
"### Init from cassio\n",
|
|
"\n",
|
|
"It's also possible to use cassio to configure the session and keyspace."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import cassio\n",
|
|
"\n",
|
|
"cassio.init(contact_points=\"127.0.0.1\", keyspace=CASSANDRA_KEYSPACE)\n",
|
|
"\n",
|
|
"loader = CassandraLoader(\n",
|
|
" table=\"movie_reviews\",\n",
|
|
")\n",
|
|
"\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Attribution statement\n",
|
|
"\n",
|
|
"> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"colab": {
|
|
"collapsed_sections": [
|
|
"5WjXERXzFEhg"
|
|
],
|
|
"provenance": []
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.17"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|