Harrison/redis cache (#3766)

Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
This commit is contained in:
Harrison Chase
2023-04-28 20:47:18 -07:00
committed by GitHub
parent b588446bf9
commit be7a8e0824
7 changed files with 616 additions and 149 deletions

79
docs/ecosystem/redis.md Normal file
View File

@@ -0,0 +1,79 @@
# Redis
This page covers how to use the [Redis](https://redis.com) ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Redis wrappers.
## Installation and Setup
- Install the Redis Python SDK with `pip install redis`
## Wrappers
### Cache
The Cache wrapper allows for [Redis](https://redis.io) to be used as a remote, low-latency, in-memory cache for LLM prompts and responses.
#### Standard Cache
The standard cache is the Redis bread & butter of use case in production for both [open source](https://redis.io) and [enterprise](https://redis.com) users globally.
To import this cache:
```python
from langchain.cache import RedisCache
```
To use this cache with your LLMs:
```python
import langchain
import redis
redis_client = redis.Redis.from_url(...)
langchain.llm_cache = RedisCache(redis_client)
```
#### Semantic Cache
Semantic caching allows users to retrieve cached prompts based on semantic similarity between the user input and previously cached results. Under the hood it blends Redis as both a cache and a vectorstore.
To import this cache:
```python
from langchain.cache import RedisSemanticCache
```
To use this cache with your LLMs:
```python
import langchain
import redis
# use any embedding provider...
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
redis_url = "redis://localhost:6379"
langchain.llm_cache = RedisSemanticCache(
embedding=FakeEmbeddings(),
redis_url=redis_url
)
```
### VectorStore
The vectorstore wrapper turns Redis into a low-latency [vector database](https://redis.com/solutions/use-cases/vector-database/) for semantic search or LLM content retrieval.
To import this vectorstore:
```python
from langchain.vectorstores import Redis
```
For a more detailed walkthrough of the Redis vectorstore wrapper, see [this notebook](../modules/indexes/vectorstores/examples/redis.ipynb).
### Retriever
The Redis vector store retriever wrapper generalizes the vectorstore class to perform low-latency document retrieval. To create the retriever, simply call `.as_retriever()` on the base vectorstore class.
### Memory
Redis can be used to persist LLM conversations.
#### Vector Store Retriever Memory
For a more detailed walkthrough of the `VectorStoreRetrieverMemory` wrapper, see [this notebook](../modules/memory/types/vectorstore_retriever_memory.ipynb).
#### Chat Message History Memory
For a detailed example of Redis to cache conversation message history, see [this notebook](../modules/memory/examples/redis_chat_message_history.ipynb).

View File

@@ -41,7 +41,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"id": "f69f6283",
"metadata": {},
"outputs": [],
@@ -52,7 +52,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"id": "64005d1f",
"metadata": {},
"outputs": [
@@ -60,8 +60,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 14.2 ms, sys: 4.9 ms, total: 19.1 ms\n",
"Wall time: 1.1 s\n"
"CPU times: user 26.1 ms, sys: 21.5 ms, total: 47.6 ms\n",
"Wall time: 1.68 s\n"
]
},
{
@@ -70,7 +70,7 @@
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -83,7 +83,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"id": "c8a1cb2b",
"metadata": {},
"outputs": [
@@ -91,8 +91,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 162 µs, sys: 7 µs, total: 169 µs\n",
"Wall time: 175 µs\n"
"CPU times: user 238 µs, sys: 143 µs, total: 381 µs\n",
"Wall time: 1.76 ms\n"
]
},
{
@@ -101,7 +101,7 @@
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -214,9 +214,18 @@
"## Redis Cache"
]
},
{
"cell_type": "markdown",
"id": "c5c9a4d5",
"metadata": {},
"source": [
"### Standard Cache\n",
"Use [Redis](../../../../ecosystem/redis.md) to cache prompts and responses."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"id": "39f6eb0b",
"metadata": {},
"outputs": [],
@@ -225,15 +234,35 @@
"# (make sure your local Redis instance is running first before running this example)\n",
"from redis import Redis\n",
"from langchain.cache import RedisCache\n",
"\n",
"langchain.llm_cache = RedisCache(redis_=Redis())"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"id": "28920749",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 6.88 ms, sys: 8.75 ms, total: 15.6 ms\n",
"Wall time: 1.04 s\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side!'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The first time, it is not yet in cache, so it should take longer\n",
@@ -242,16 +271,124 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"id": "94bf9415",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.59 ms, sys: 610 µs, total: 2.2 ms\n",
"Wall time: 5.58 ms\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side!'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The second time it is, so it goes faster\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "markdown",
"id": "82be23f6",
"metadata": {},
"source": [
"### Semantic Cache\n",
"Use [Redis](../../../../ecosystem/redis.md) to cache prompts and responses and evaluate hits based on semantic similarity."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "64df3099",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.cache import RedisSemanticCache\n",
"\n",
"\n",
"langchain.llm_cache = RedisSemanticCache(\n",
" redis_url=\"redis://localhost:6379\",\n",
" embedding=OpenAIEmbeddings()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "8e91d3ac",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 351 ms, sys: 156 ms, total: 507 ms\n",
"Wall time: 3.37 s\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nWhy don't scientists trust atoms?\\nBecause they make up everything.\""
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The first time, it is not yet in cache, so it should take longer\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "df856948",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 6.25 ms, sys: 2.72 ms, total: 8.97 ms\n",
"Wall time: 262 ms\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nWhy don't scientists trust atoms?\\nBecause they make up everything.\""
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The second time, while not a direct hit, the question is semantically similar to the original question,\n",
"# so it uses the cached result!\n",
"llm(\"Tell me one joke\")"
]
},
{
"cell_type": "markdown",
"id": "684eab55",