diff --git a/docs/docs/integrations/stores/astradb.ipynb b/docs/docs/integrations/stores/astradb.ipynb new file mode 100644 index 00000000000..5904d2e3015 --- /dev/null +++ b/docs/docs/integrations/stores/astradb.ipynb @@ -0,0 +1,240 @@ +{ + "cells": [ + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "---\n", + "sidebar_label: Astra DB\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Astra DB\n", + "\n", + "DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API.\n", + "\n", + "`AstraDBStore` and `AstraDBByteStore` need the `astrapy` package to be installed:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet astrapy" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Store takes the following parameters:\n", + "\n", + "* `api_endpoint`: Astra DB API endpoint. Looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n", + "* `token`: Astra DB token. Looks like `AstraCS:6gBhNmsk135....`\n", + "* `collection_name` : Astra DB collection name\n", + "* `namespace`: (Optional) Astra DB namespace" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## AstraDBStore\n", + "\n", + "The `AstraDBStore` is an implementation of `BaseStore` that stores everything in your DataStax Astra DB instance.\n", + "The store keys must be strings and will be mapped to the `_id` field of the Astra DB document.\n", + "The store values can be any object that can be serialized by `json.dumps`.\n", + "In the database, entries will have the form:\n", + "\n", + "```json\n", + "{\n", + " \"_id\": \"\",\n", + " \"value\": \n", + "}\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.storage import AstraDBStore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from getpass import getpass\n", + "\n", + "ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n", + "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "store = AstraDBStore(\n", + " api_endpoint=ASTRA_DB_API_ENDPOINT,\n", + " token=ASTRA_DB_APPLICATION_TOKEN,\n", + " collection_name=\"my_store\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['v1', [0.1, 0.2, 0.3]]\n" + ] + } + ], + "source": [ + "store.mset([(\"k1\", \"v1\"), (\"k2\", [0.1, 0.2, 0.3])])\n", + "print(store.mget([\"k1\", \"k2\"]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Usage with CacheBackedEmbeddings\n", + "\n", + "You may use the `AstraDBStore` in conjunction with a [`CacheBackedEmbeddings`](/docs/modules/data_connection/text_embedding/caching_embeddings) to cache the result of embeddings computations.\n", + "Note that `AstraDBStore` stores the embeddings as a list of floats without converting them first to bytes so we don't use `fromByteStore` there." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.embeddings import CacheBackedEmbeddings, OpenAIEmbeddings\n", + "\n", + "embeddings = CacheBackedEmbeddings(\n", + " underlying_embeddings=OpenAIEmbeddings(), document_embedding_store=store\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## AstraDBByteStore\n", + "\n", + "The `AstraDBByteStore` is an implementation of `ByteStore` that stores everything in your DataStax Astra DB instance.\n", + "The store keys must be strings and will be mapped to the `_id` field of the Astra DB document.\n", + "The store `bytes` values are converted to base64 strings for storage into Astra DB.\n", + "In the database, entries will have the form:\n", + "\n", + "```json\n", + "{\n", + " \"_id\": \"\",\n", + " \"value\": \"bytes encoded in base 64\"\n", + "}\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.storage import AstraDBByteStore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from getpass import getpass\n", + "\n", + "ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n", + "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "store = AstraDBByteStore(\n", + " api_endpoint=ASTRA_DB_API_ENDPOINT,\n", + " token=ASTRA_DB_APPLICATION_TOKEN,\n", + " collection_name=\"my_store\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[b'v1', b'v2']\n" + ] + } + ], + "source": [ + "store.mset([(\"k1\", b\"v1\"), (\"k2\", b\"v2\")])\n", + "print(store.mget([\"k1\", \"k2\"]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}