mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-27 08:58:48 +00:00
docs: Add documentation for Cassandra Document Loader (#16282)
This commit is contained in:
parent
d1b4ead87c
commit
8da34118bc
241
docs/docs/integrations/document_loaders/cassandra.ipynb
Normal file
241
docs/docs/integrations/document_loaders/cassandra.ipynb
Normal file
@ -0,0 +1,241 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"attachments": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "vm8vn9t8DvC_"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Cassandra"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"[Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"attachments": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "5WjXERXzFEhg"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"## Overview"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"attachments": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"id": "juAmbgoWD17u"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"The Cassandra Document Loader returns a list of Langchain Documents from a Cassandra database.\n",
|
||||||
|
"\n",
|
||||||
|
"You must either provide a CQL query or a table name to retrieve the documents.\n",
|
||||||
|
"The Loader takes the following parameters:\n",
|
||||||
|
"\n",
|
||||||
|
"* table: (Optional) The table to load the data from.\n",
|
||||||
|
"* session: (Optional) The cassandra driver session. If not provided, the cassio resolved session will be used.\n",
|
||||||
|
"* keyspace: (Optional) The keyspace of the table. If not provided, the cassio resolved keyspace will be used.\n",
|
||||||
|
"* query: (Optional) The query used to load the data.\n",
|
||||||
|
"* page_content_mapper: (Optional) a function to convert a row to string page content. The default converts the row to JSON.\n",
|
||||||
|
"* metadata_mapper: (Optional) a function to convert a row to metadata dict.\n",
|
||||||
|
"* query_parameters: (Optional) The query parameters used when calling session.execute .\n",
|
||||||
|
"* query_timeout: (Optional) The query timeout used when calling session.execute .\n",
|
||||||
|
"* query_custom_payload: (Optional) The query custom_payload used when calling `session.execute`.\n",
|
||||||
|
"* query_execution_profile: (Optional) The query execution_profile used when calling `session.execute`.\n",
|
||||||
|
"* query_host: (Optional) The query host used when calling `session.execute`.\n",
|
||||||
|
"* query_execute_as: (Optional) The query execute_as used when calling `session.execute`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"attachments": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Load documents with the Document Loader"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain_community.document_loaders import CassandraLoader"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"### Init from a cassandra driver Session\n",
|
||||||
|
"\n",
|
||||||
|
"You need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from cassandra.cluster import Cluster\n",
|
||||||
|
"\n",
|
||||||
|
"cluster = Cluster()\n",
|
||||||
|
"session = cluster.connect()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false
|
||||||
|
},
|
||||||
|
"execution_count": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"You need to provide the name of an existing keyspace of the Cassandra instance:"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false
|
||||||
|
},
|
||||||
|
"execution_count": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"Creating the document loader:"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 16,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2024-01-19T15:47:25.893037Z",
|
||||||
|
"start_time": "2024-01-19T15:47:25.889398Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"loader = CassandraLoader(\n",
|
||||||
|
" table=\"movie_reviews\",\n",
|
||||||
|
" session=session,\n",
|
||||||
|
" keyspace=CASSANDRA_KEYSPACE,\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"docs = loader.load()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false,
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2024-01-19T15:47:26.399472Z",
|
||||||
|
"start_time": "2024-01-19T15:47:26.389145Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"execution_count": 17
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 19,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2024-01-19T15:47:33.287783Z",
|
||||||
|
"start_time": "2024-01-19T15:47:33.277862Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": "Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\" the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})"
|
||||||
|
},
|
||||||
|
"execution_count": 19,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"docs[0]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"### Init from cassio\n",
|
||||||
|
"\n",
|
||||||
|
"It's also possible to use cassio to configure the session and keyspace."
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import cassio\n",
|
||||||
|
"\n",
|
||||||
|
"cassio.init(contact_points=\"127.0.0.1\", keyspace=CASSANDRA_KEYSPACE)\n",
|
||||||
|
"\n",
|
||||||
|
"loader = CassandraLoader(\n",
|
||||||
|
" table=\"movie_reviews\",\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"docs = loader.load()"
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": false
|
||||||
|
},
|
||||||
|
"execution_count": null
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"collapsed_sections": [
|
||||||
|
"5WjXERXzFEhg"
|
||||||
|
],
|
||||||
|
"provenance": []
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.9.18"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 4
|
||||||
|
}
|
Loading…
Reference in New Issue
Block a user