Add caching to BaseChatModel (issue #1644) (#5089)

#  Add caching to BaseChatModel
Fixes #1644

(Sidenote: While testing, I noticed we have multiple implementations of
Fake LLMs, used for testing. I consolidated them.)

## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
Models
- @hwchase17
- @agola11

Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) | Discord:
RicChilligerDude#7589

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This commit is contained in:
UmerHA
2023-06-24 20:45:09 +02:00
committed by GitHub
parent c289cc891a
commit 068142fce2
11 changed files with 465 additions and 63 deletions

View File

@@ -0,0 +1,9 @@
# Caching
LangChain provides an optional caching layer for Chat Models. This is useful for two reasons:
It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
It can speed up your application by reducing the number of API calls you make to the LLM provider.
import CachingChat from "@snippets/modules/model_io/models/chat/how_to/chat_model_caching.mdx"
<CachingChat/>

View File

@@ -0,0 +1,97 @@
```python
import langchain
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI()
```
## In Memory Cache
```python
from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
```
<CodeOutputBlock lang="python">
```
CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms
Wall time: 4.83 s
"\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"
```
</CodeOutputBlock>
```python
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
```
<CodeOutputBlock lang="python">
```
CPU times: user 238 µs, sys: 143 µs, total: 381 µs
Wall time: 1.76 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
```
</CodeOutputBlock>
## SQLite Cache
```bash
rm .langchain.db
```
```python
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")
```
```python
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
```
<CodeOutputBlock lang="python">
```
CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms
Wall time: 825 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
```
</CodeOutputBlock>
```python
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
```
<CodeOutputBlock lang="python">
```
CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms
Wall time: 2.67 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
```
</CodeOutputBlock>

View File

@@ -14,7 +14,7 @@ from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
llm.predict("Tell me a joke")
```
<CodeOutputBlock lang="python">
@@ -32,7 +32,7 @@ llm("Tell me a joke")
```python
# The second time it is, so it goes faster
llm("Tell me a joke")
llm.predict("Tell me a joke")
```
<CodeOutputBlock lang="python">
@@ -64,7 +64,7 @@ langchain.llm_cache = SQLiteCache(database_path=".langchain.db")
```python
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
llm.predict("Tell me a joke")
```
<CodeOutputBlock lang="python">
@@ -82,7 +82,7 @@ llm("Tell me a joke")
```python
# The second time it is, so it goes faster
llm("Tell me a joke")
llm.predict("Tell me a joke")
```
<CodeOutputBlock lang="python">