{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "vm8vn9t8DvC_" }, "source": [ "# Cassandra" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "5WjXERXzFEhg" }, "source": [ "## Overview" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "juAmbgoWD17u" }, "source": [ "The Cassandra Document Loader returns a list of Langchain Documents from a Cassandra database.\n", "\n", "You must either provide a CQL query or a table name to retrieve the documents.\n", "The Loader takes the following parameters:\n", "\n", "* table: (Optional) The table to load the data from.\n", "* session: (Optional) The cassandra driver session. If not provided, the cassio resolved session will be used.\n", "* keyspace: (Optional) The keyspace of the table. If not provided, the cassio resolved keyspace will be used.\n", "* query: (Optional) The query used to load the data.\n", "* page_content_mapper: (Optional) a function to convert a row to string page content. The default converts the row to JSON.\n", "* metadata_mapper: (Optional) a function to convert a row to metadata dict.\n", "* query_parameters: (Optional) The query parameters used when calling session.execute .\n", "* query_timeout: (Optional) The query timeout used when calling session.execute .\n", "* query_custom_payload: (Optional) The query custom_payload used when calling `session.execute`.\n", "* query_execution_profile: (Optional) The query execution_profile used when calling `session.execute`.\n", "* query_host: (Optional) The query host used when calling `session.execute`.\n", "* query_execute_as: (Optional) The query execute_as used when calling `session.execute`." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Load documents with the Document Loader" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from langchain_community.document_loaders import CassandraLoader" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "### Init from a cassandra driver Session\n", "\n", "You need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "from cassandra.cluster import Cluster\n", "\n", "cluster = Cluster()\n", "session = cluster.connect()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "You need to provide the name of an existing keyspace of the Cassandra instance:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "Creating the document loader:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2024-01-19T15:47:25.893037Z", "start_time": "2024-01-19T15:47:25.889398Z" } }, "outputs": [], "source": [ "loader = CassandraLoader(\n", " table=\"movie_reviews\",\n", " session=session,\n", " keyspace=CASSANDRA_KEYSPACE,\n", ")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2024-01-19T15:47:26.399472Z", "start_time": "2024-01-19T15:47:26.389145Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "docs = loader.load()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2024-01-19T15:47:33.287783Z", "start_time": "2024-01-19T15:47:33.277862Z" } }, "outputs": [ { "data": { "text/plain": [ "Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\" the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "docs[0]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "### Init from cassio\n", "\n", "It's also possible to use cassio to configure the session and keyspace." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "import cassio\n", "\n", "cassio.init(contact_points=\"127.0.0.1\", keyspace=CASSANDRA_KEYSPACE)\n", "\n", "loader = CassandraLoader(\n", " table=\"movie_reviews\",\n", ")\n", "\n", "docs = loader.load()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Attribution statement\n", "\n", "> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries." ] } ], "metadata": { "colab": { "collapsed_sections": [ "5WjXERXzFEhg" ], "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.17" } }, "nbformat": 4, "nbformat_minor": 4 }