Harrison/redis cache (#3766)

Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
2025-09-08 22:42:05 +00:00 · 2023-04-28 20:47:18 -07:00
parent b588446bf9
commit be7a8e0824
7 changed files with 616 additions and 149 deletions
--- a/docs/ecosystem/redis.md
+++ b/docs/ecosystem/redis.md
@@ -0,0 +1,79 @@
+# Redis
+
+This page covers how to use the [Redis](https://redis.com) ecosystem within LangChain.
+It is broken into two parts: installation and setup, and then references to specific Redis wrappers.
+
+## Installation and Setup
+- Install the Redis Python SDK with `pip install redis`
+
+## Wrappers
+
+### Cache
+
+The Cache wrapper allows for [Redis](https://redis.io) to be used as a remote, low-latency, in-memory cache for LLM prompts and responses.
+
+#### Standard Cache
+The standard cache is the Redis bread & butter of use case in production for both [open source](https://redis.io) and [enterprise](https://redis.com) users globally.
+
+To import this cache:
+```python
+from langchain.cache import RedisCache
+```
+
+To use this cache with your LLMs:
+```python
+import langchain
+import redis
+
+redis_client = redis.Redis.from_url(...)
+langchain.llm_cache = RedisCache(redis_client)
+```
+
+#### Semantic Cache
+Semantic caching allows users to retrieve cached prompts based on semantic similarity between the user input and previously cached results. Under the hood it blends Redis as both a cache and a vectorstore.
+
+To import this cache:
+```python
+from langchain.cache import RedisSemanticCache
+```
+
+To use this cache with your LLMs:
+```python
+import langchain
+import redis
+
+# use any embedding provider...
+from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
+
+redis_url = "redis://localhost:6379"
+
+langchain.llm_cache = RedisSemanticCache(
+    embedding=FakeEmbeddings(),
+    redis_url=redis_url
+)
+```
+
+### VectorStore
+
+The vectorstore wrapper turns Redis into a low-latency [vector database](https://redis.com/solutions/use-cases/vector-database/) for semantic search or LLM content retrieval.
+
+To import this vectorstore:
+```python
+from langchain.vectorstores import Redis
+```
+
+For a more detailed walkthrough of the Redis vectorstore wrapper, see [this notebook](../modules/indexes/vectorstores/examples/redis.ipynb).
+
+### Retriever
+
+The Redis vector store retriever wrapper generalizes the vectorstore class to perform low-latency document retrieval. To create the retriever, simply call `.as_retriever()` on the base vectorstore class.
+
+### Memory
+Redis can be used to persist LLM conversations.
+
+#### Vector Store Retriever Memory
+
+For a more detailed walkthrough of the `VectorStoreRetrieverMemory` wrapper, see [this notebook](../modules/memory/types/vectorstore_retriever_memory.ipynb).
+
+#### Chat Message History Memory
+For a detailed example of Redis to cache conversation message history, see [this notebook](../modules/memory/examples/redis_chat_message_history.ipynb).
--- a/docs/modules/models/llms/examples/llm_caching.ipynb
+++ b/docs/modules/models/llms/examples/llm_caching.ipynb
@@ -41,7 +41,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 5,
   "id": "f69f6283",
   "metadata": {},
   "outputs": [],
@@ -52,7 +52,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 6,
   "id": "64005d1f",
   "metadata": {},
   "outputs": [
@@ -60,8 +60,8 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "CPU times: user 14.2 ms, sys: 4.9 ms, total: 19.1 ms\n",
-      "Wall time: 1.1 s\n"
+      "CPU times: user 26.1 ms, sys: 21.5 ms, total: 47.6 ms\n",
+      "Wall time: 1.68 s\n"
     ]
    },
    {
@@ -70,7 +70,7 @@
       "'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
      ]
     },
-     "execution_count": 4,
+     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -83,7 +83,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 7,
   "id": "c8a1cb2b",
   "metadata": {},
   "outputs": [
@@ -91,8 +91,8 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "CPU times: user 162 µs, sys: 7 µs, total: 169 µs\n",
-      "Wall time: 175 µs\n"
+      "CPU times: user 238 µs, sys: 143 µs, total: 381 µs\n",
+      "Wall time: 1.76 ms\n"
     ]
    },
    {
@@ -101,7 +101,7 @@
       "'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
      ]
     },
-     "execution_count": 5,
+     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -214,9 +214,18 @@
    "## Redis Cache"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "c5c9a4d5",
+   "metadata": {},
+   "source": [
+    "### Standard Cache\n",
+    "Use [Redis](../../../../ecosystem/redis.md) to cache prompts and responses."
+   ]
+  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 8,
   "id": "39f6eb0b",
   "metadata": {},
   "outputs": [],
@@ -225,15 +234,35 @@
    "# (make sure your local Redis instance is running first before running this example)\n",
    "from redis import Redis\n",
    "from langchain.cache import RedisCache\n",
+    "\n",
    "langchain.llm_cache = RedisCache(redis_=Redis())"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 9,
   "id": "28920749",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 6.88 ms, sys: 8.75 ms, total: 15.6 ms\n",
+      "Wall time: 1.04 s\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side!'"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "%%time\n",
    "# The first time, it is not yet in cache, so it should take longer\n",
@@ -242,16 +271,124 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 14,
   "id": "94bf9415",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 1.59 ms, sys: 610 µs, total: 2.2 ms\n",
+      "Wall time: 5.58 ms\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side!'"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "%%time\n",
    "# The second time it is, so it goes faster\n",
    "llm(\"Tell me a joke\")"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "82be23f6",
+   "metadata": {},
+   "source": [
+    "### Semantic Cache\n",
+    "Use [Redis](../../../../ecosystem/redis.md) to cache prompts and responses and evaluate hits based on semantic similarity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "64df3099",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings import OpenAIEmbeddings\n",
+    "from langchain.cache import RedisSemanticCache\n",
+    "\n",
+    "\n",
+    "langchain.llm_cache = RedisSemanticCache(\n",
+    "    redis_url=\"redis://localhost:6379\",\n",
+    "    embedding=OpenAIEmbeddings()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "8e91d3ac",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 351 ms, sys: 156 ms, total: 507 ms\n",
+      "Wall time: 3.37 s\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\"\\n\\nWhy don't scientists trust atoms?\\nBecause they make up everything.\""
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "# The first time, it is not yet in cache, so it should take longer\n",
+    "llm(\"Tell me a joke\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "df856948",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 6.25 ms, sys: 2.72 ms, total: 8.97 ms\n",
+      "Wall time: 262 ms\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\"\\n\\nWhy don't scientists trust atoms?\\nBecause they make up everything.\""
+      ]
+     },
+     "execution_count": 27,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "# The second time, while not a direct hit, the question is semantically similar to the original question,\n",
+    "# so it uses the cached result!\n",
+    "llm(\"Tell me one joke\")"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "684eab55",