langchain/docs/versioned_docs/version-0.2.x/how_to/chat_model_caching.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "dcf87b32",
   "metadata": {},
   "source": [
    "# How to cache chat model responses\n",
    "\n",
    "LangChain provides an optional caching layer for chat models. This is useful for two main reasons:\n",
    "\n",
    "- It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times. This is especially useful during app development.\n",
    "- It can speed up your application by reducing the number of API calls you make to the LLM provider.\n",
    "\n",
    "This guide will walk you through how to enable this in your apps.\n",
    "\n",
    "```{=mdx}\n",
    "import PrerequisiteLinks from \"@theme/PrerequisiteLinks\";\n",
    "\n",
    "<PrerequisiteLinks content={`\n",
    "- [Chat models](/docs/concepts/#chat-models)\n",
    "- [LLMs](/docs/concepts/#llms)\n",
    "`} />\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "289b31de",
   "metadata": {},
   "source": [
    "```{=mdx}\n",
    "import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
    "\n",
    "<ChatModelTabs customVarName=\"llm\" />\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c6641f37",
   "metadata": {},
   "outputs": [],
   "source": [
    "# | output: false\n",
    "# | echo: false\n",
    "\n",
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass()\n",
    "\n",
    "llm = ChatOpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5472a032",
   "metadata": {},
   "outputs": [],
   "source": [
    "# <!-- ruff: noqa: F821 -->\n",
    "from langchain.globals import set_llm_cache"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "357b89a8",
   "metadata": {},
   "source": [
    "## In Memory Cache\n",
    "\n",
    "This is an ephemeral cache that stores model calls in memory. It will be wiped when your environment restarts, and is not shared across processes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "113e719a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 645 ms, sys: 214 ms, total: 859 ms\n",
      "Wall time: 829 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "AIMessage(content=\"Why don't scientists trust atoms?\\n\\nBecause they make up everything!\", response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 11, 'total_tokens': 24}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-b6836bdd-8c30-436b-828f-0ac5fc9ab50e-0')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "from langchain.cache import InMemoryCache\n",
    "\n",
    "set_llm_cache(InMemoryCache())\n",
    "\n",
    "# The first time, it is not yet in cache, so it should take longer\n",
    "llm.invoke(\"Tell me a joke\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "a2121434",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 822 µs, sys: 288 µs, total: 1.11 ms\n",
      "Wall time: 1.06 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "AIMessage(content=\"Why don't scientists trust atoms?\\n\\nBecause they make up everything!\", response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 11, 'total_tokens': 24}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-b6836bdd-8c30-436b-828f-0ac5fc9ab50e-0')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "# The second time it is, so it goes faster\n",
    "llm.invoke(\"Tell me a joke\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b88ff8af",
   "metadata": {},
   "source": [
    "## SQLite Cache\n",
    "\n",
    "This cache implementation uses a `SQLite` database to store responses, and will last across process restarts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "99290ab4",
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm .langchain.db"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "fe826c5c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can do the same thing with a SQLite cache\n",
    "from langchain.cache import SQLiteCache\n",
    "\n",
    "set_llm_cache(SQLiteCache(database_path=\".langchain.db\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "eb558734",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 9.91 ms, sys: 7.68 ms, total: 17.6 ms\n",
      "Wall time: 657 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Why did the scarecrow win an award? Because he was outstanding in his field!', response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 11, 'total_tokens': 28}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-39d9e1e8-7766-4970-b1d8-f50213fd94c5-0')"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "# The first time, it is not yet in cache, so it should take longer\n",
    "llm.invoke(\"Tell me a joke\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "497c7000",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 52.2 ms, sys: 60.5 ms, total: 113 ms\n",
      "Wall time: 127 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Why did the scarecrow win an award? Because he was outstanding in his field!', id='run-39d9e1e8-7766-4970-b1d8-f50213fd94c5-0')"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "# The second time it is, so it goes faster\n",
    "llm.invoke(\"Tell me a joke\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2950a913",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "You've now learned how to cache model responses to save time and money.\n",
    "\n",
    "Next, check out the other how-to guides chat models in this section, like [how to get a model to return structured output](/docs/how_to/structured_output) or [how to create your own custom chat model](/docs/how_to/custom_chat_model)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}