From 8da34118bc7c581e0d2562b5dce308ecf6a37179 Mon Sep 17 00:00:00 2001 From: Christophe Bornet Date: Mon, 22 Jan 2024 23:06:21 +0100 Subject: [PATCH] docs: Add documentation for Cassandra Document Loader (#16282) --- .../document_loaders/cassandra.ipynb | 241 ++++++++++++++++++ 1 file changed, 241 insertions(+) create mode 100644 docs/docs/integrations/document_loaders/cassandra.ipynb diff --git a/docs/docs/integrations/document_loaders/cassandra.ipynb b/docs/docs/integrations/document_loaders/cassandra.ipynb new file mode 100644 index 00000000000..49f261a18a8 --- /dev/null +++ b/docs/docs/integrations/document_loaders/cassandra.ipynb @@ -0,0 +1,241 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "id": "vm8vn9t8DvC_" + }, + "source": [ + "# Cassandra" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html)." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "id": "5WjXERXzFEhg" + }, + "source": [ + "## Overview" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "id": "juAmbgoWD17u" + }, + "source": [ + "The Cassandra Document Loader returns a list of Langchain Documents from a Cassandra database.\n", + "\n", + "You must either provide a CQL query or a table name to retrieve the documents.\n", + "The Loader takes the following parameters:\n", + "\n", + "* table: (Optional) The table to load the data from.\n", + "* session: (Optional) The cassandra driver session. If not provided, the cassio resolved session will be used.\n", + "* keyspace: (Optional) The keyspace of the table. If not provided, the cassio resolved keyspace will be used.\n", + "* query: (Optional) The query used to load the data.\n", + "* page_content_mapper: (Optional) a function to convert a row to string page content. The default converts the row to JSON.\n", + "* metadata_mapper: (Optional) a function to convert a row to metadata dict.\n", + "* query_parameters: (Optional) The query parameters used when calling session.execute .\n", + "* query_timeout: (Optional) The query timeout used when calling session.execute .\n", + "* query_custom_payload: (Optional) The query custom_payload used when calling `session.execute`.\n", + "* query_execution_profile: (Optional) The query execution_profile used when calling `session.execute`.\n", + "* query_host: (Optional) The query host used when calling `session.execute`.\n", + "* query_execute_as: (Optional) The query execute_as used when calling `session.execute`." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load documents with the Document Loader" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.document_loaders import CassandraLoader" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Init from a cassandra driver Session\n", + "\n", + "You need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "outputs": [], + "source": [ + "from cassandra.cluster import Cluster\n", + "\n", + "cluster = Cluster()\n", + "session = cluster.connect()" + ], + "metadata": { + "collapsed": false + }, + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "You need to provide the name of an existing keyspace of the Cassandra instance:" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "outputs": [], + "source": [ + "CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")" + ], + "metadata": { + "collapsed": false + }, + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "Creating the document loader:" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "ExecuteTime": { + "end_time": "2024-01-19T15:47:25.893037Z", + "start_time": "2024-01-19T15:47:25.889398Z" + } + }, + "outputs": [], + "source": [ + "loader = CassandraLoader(\n", + " table=\"movie_reviews\",\n", + " session=session,\n", + " keyspace=CASSANDRA_KEYSPACE,\n", + ")" + ] + }, + { + "cell_type": "code", + "outputs": [], + "source": [ + "docs = loader.load()" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-01-19T15:47:26.399472Z", + "start_time": "2024-01-19T15:47:26.389145Z" + } + }, + "execution_count": 17 + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "ExecuteTime": { + "end_time": "2024-01-19T15:47:33.287783Z", + "start_time": "2024-01-19T15:47:33.277862Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": "Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\" the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})" + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "docs[0]" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Init from cassio\n", + "\n", + "It's also possible to use cassio to configure the session and keyspace." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "outputs": [], + "source": [ + "import cassio\n", + "\n", + "cassio.init(contact_points=\"127.0.0.1\", keyspace=CASSANDRA_KEYSPACE)\n", + "\n", + "loader = CassandraLoader(\n", + " table=\"movie_reviews\",\n", + ")\n", + "\n", + "docs = loader.load()" + ], + "metadata": { + "collapsed": false + }, + "execution_count": null + } + ], + "metadata": { + "colab": { + "collapsed_sections": [ + "5WjXERXzFEhg" + ], + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.18" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}