FEATURE: Astra DB, LLM cache classes (exact-match and semantic cache) (#13834)

This PR provides idiomatic implementations for the exact-match and the semantic LLM caches using Astra DB as backend through the database's HTTP JSON API. These caches require the `astrapy` library as dependency. Comes with integration tests and example usage in the `llm_cache.ipynb` in the docs. @baskaryan this is the Astra DB counterpart for the Cassandra classes you merged some time ago, tagging you for your familiarity with the topic. Thank you! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-09-08 14:31:55 +00:00 · 2023-11-25 03:53:37 +01:00
parent 272df9dcae
commit 19c68c7652
4 changed files with 652 additions and 3 deletions
--- a/docs/docs/integrations/llms/llm_caching.ipynb
+++ b/docs/docs/integrations/llms/llm_caching.ipynb
@@ -912,7 +912,7 @@
   "source": [
    "## `Cassandra` caches\n",
    "\n",
-    "You can use Cassandra / Astra DB for caching LLM responses, choosing from the exact-match `CassandraCache` or the (vector-similarity-based) `CassandraSemanticCache`.\n",
+    "You can use Cassandra / Astra DB through CQL for caching LLM responses, choosing from the exact-match `CassandraCache` or the (vector-similarity-based) `CassandraSemanticCache`.\n",
    "\n",
    "Let's see both in action in the following cells."
   ]
@@ -924,7 +924,7 @@
   "source": [
    "#### Connect to the DB\n",
    "\n",
-    "First you need to establish a `Session` to the DB and to specify a _keyspace_ for the cache table(s). The following gets you started with an Astra DB instance (see e.g. [here](https://cassio.org/start_here/#vector-database) for more backends and connection options)."
+    "First you need to establish a `Session` to the DB and to specify a _keyspace_ for the cache table(s). The following gets you connected to Astra DB through CQL (see e.g. [here](https://cassio.org/start_here/#vector-database) for more backends and connection options)."
   ]
  },
  {
@@ -1132,6 +1132,214 @@
    "print(llm(\"How come we always see one face of the moon?\"))"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "8712f8fc-bb89-4164-beb9-c672778bbd91",
+   "metadata": {},
+   "source": [
+    "## `Astra DB` Caches"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "173041d9-e4af-4f68-8461-d302bfc7e1bd",
+   "metadata": {},
+   "source": [
+    "You can easily use [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) as an LLM cache, with either the \"exact\" or the \"semantic-based\" cache.\n",
+    "\n",
+    "Make sure you have a running database (it must be a Vector-enabled database to use the Semantic cache) and get the required credentials on your Astra dashboard:\n",
+    "\n",
+    "- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
+    "- the Token looks like `AstraCS:6gBhNmsk135....`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "feb510b6-99a3-4228-8e11-563051f8178e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      "ASTRA_DB_API_ENDPOINT =  https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n",
+      "ASTRA_DB_APPLICATION_TOKEN =  ········\n"
+     ]
+    }
+   ],
+   "source": [
+    "import getpass\n",
+    "\n",
+    "ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n",
+    "ASTRA_DB_APPLICATION_TOKEN = getpass.getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee6d587f-4b7c-43f4-9e90-5129c842a143",
+   "metadata": {},
+   "source": [
+    "### Astra DB exact LLM cache\n",
+    "\n",
+    "This will avoid invoking the LLM when the supplied prompt is _exactly_ the same as one encountered already:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "ad63c146-ee41-4896-90ee-29fcc39f0ed5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.cache import AstraDBCache\n",
+    "from langchain.globals import set_llm_cache\n",
+    "\n",
+    "set_llm_cache(\n",
+    "    AstraDBCache(\n",
+    "        api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
+    "        token=ASTRA_DB_APPLICATION_TOKEN,\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "83e0fb02-e8eb-4483-9eb1-55b5e14c4487",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "There is no definitive answer to this question as it depends on the interpretation of the terms \"true fakery\" and \"fake truth\". However, one possible interpretation is that a true fakery is a counterfeit or imitation that is intended to deceive, whereas a fake truth is a false statement that is presented as if it were true.\n",
+      "CPU times: user 70.8 ms, sys: 4.13 ms, total: 74.9 ms\n",
+      "Wall time: 2.06 s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "\n",
+    "print(llm(\"Is a true fakery the same as a fake truth?\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "4d20d498-fe28-4e26-8531-2b31c52ee687",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "There is no definitive answer to this question as it depends on the interpretation of the terms \"true fakery\" and \"fake truth\". However, one possible interpretation is that a true fakery is a counterfeit or imitation that is intended to deceive, whereas a fake truth is a false statement that is presented as if it were true.\n",
+      "CPU times: user 15.1 ms, sys: 3.7 ms, total: 18.8 ms\n",
+      "Wall time: 531 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "\n",
+    "print(llm(\"Is a true fakery the same as a fake truth?\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "524b94fa-6162-4880-884d-d008749d14e2",
+   "metadata": {},
+   "source": [
+    "### Astra DB Semantic cache\n",
+    "\n",
+    "This cache will do a semantic similarity search and return a hit if it finds a cached entry that is similar enough, For this, you need to provide an `Embeddings` instance of your choice."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "dc329c55-1cc4-4b74-94f9-61f8990fb214",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings import OpenAIEmbeddings\n",
+    "\n",
+    "embedding = OpenAIEmbeddings()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "83952a90-ab14-4e59-87c0-d2bdc1d43e43",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.cache import AstraDBSemanticCache\n",
+    "\n",
+    "set_llm_cache(\n",
+    "    AstraDBSemanticCache(\n",
+    "        api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
+    "        token=ASTRA_DB_APPLICATION_TOKEN,\n",
+    "        embedding=embedding,\n",
+    "        collection_name=\"demo_semantic_cache\",\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "d74b249a-94d5-42d0-af74-f7565a994dea",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "There is no definitive answer to this question since it presupposes a great deal about the nature of truth itself, which is a matter of considerable philosophical debate. It is possible, however, to construct scenarios in which something could be considered true despite being false, such as if someone sincerely believes something to be true even though it is not.\n",
+      "CPU times: user 65.6 ms, sys: 15.3 ms, total: 80.9 ms\n",
+      "Wall time: 2.72 s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "\n",
+    "print(llm(\"Are there truths that are false?\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "11973d73-d2f4-46bd-b229-1c589df9b788",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "There is no definitive answer to this question since it presupposes a great deal about the nature of truth itself, which is a matter of considerable philosophical debate. It is possible, however, to construct scenarios in which something could be considered true despite being false, such as if someone sincerely believes something to be true even though it is not.\n",
+      "CPU times: user 29.3 ms, sys: 6.21 ms, total: 35.5 ms\n",
+      "Wall time: 1.03 s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "\n",
+    "print(llm(\"Is is possible that something false can be also true?\"))"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "0c69d84d",
--- a/docs/docs/integrations/providers/astradb.mdx
+++ b/docs/docs/integrations/providers/astradb.mdx
@@ -29,8 +29,35 @@ vector_store = AstraDB(

 Learn more in the [example notebook](/docs/integrations/vectorstores/astradb).

+### LLM Cache

-### Memory
+```python
+from langchain.globals import set_llm_cache
+from langchain.cache import AstraDBCache
+set_llm_cache(AstraDBCache(
+    api_endpoint="...",
+    token="...",
+))
+```
+
+Learn more in the [example notebook](/docs/integrations/llms/llm_caching) (scroll to the Astra DB section).
+
+
+### Semantic LLM Cache
+
+```python
+from langchain.globals import set_llm_cache
+from langchain.cache import AstraDBSemanticCache
+set_llm_cache(AstraDBSemanticCache(
+    embedding=my_embedding,
+    api_endpoint="...",
+    token="...",
+))
+```
+
+Learn more in the [example notebook](/docs/integrations/llms/llm_caching) (scroll to the appropriate section).
+
+### Chat message history

 ```python
 from langchain.memory import AstraDBChatMessageHistory