x

qxqx
x
2026-02-04 16:20:16 +00:00 · 2024-09-25 13:37:13 -04:00 · 2024-09-25 13:01:26 -04:00 · 2024-09-24 16:39:25 -04:00 · 2024-09-24 14:09:56 -04:00 · 2024-09-24 13:50:11 -04:00
7 changed files with 1187 additions and 1331 deletions
--- a/docs/docs/how_to/chatbots_memory.ipynb
+++ b/docs/docs/how_to/chatbots_memory.ipynb
@@ -23,6 +23,14 @@
    "\n",
    "We'll go into more detail on a few techniques below!\n",
    "\n",
+    ":::{.callout-note}\n",
+    "\n",
+    "This how-to guide previously built a chatbot using [RunnableWithMessageHistory](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html). You can access this version of the tutorial in the [v0.2 docs](https://python.langchain.com/v0.2/docs/how_to/chatbots_memory/).\n",
+    "\n",
+    "The LangGraph implementation offers a number of advantages over `RunnableWithMessageHistory`, including the ability to persist arbitrary components of an application's state (instead of only messages).\n",
+    "\n",
+    ":::\n",
+    "\n",
    "## Setup\n",
    "\n",
    "You'll need to install a few packages, and have your OpenAI API key set as an environment variable named `OPENAI_API_KEY`:"
@@ -33,15 +41,6 @@
   "execution_count": 1,
   "metadata": {},
   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.3.2 is available.\n",
-      "You should consider upgrading via the '/Users/jacoblee/.pyenv/versions/3.10.5/bin/python -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
-      "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n"
-     ]
-    },
    {
     "data": {
      "text/plain": [
@@ -54,12 +53,13 @@
    }
   ],
   "source": [
-    "%pip install --upgrade --quiet langchain langchain-openai\n",
+    "%pip install --upgrade --quiet langchain langchain-openai langgraph\n",
    "\n",
-    "# Set env var OPENAI_API_KEY or load from a .env file:\n",
-    "import dotenv\n",
+    "import getpass\n",
+    "import os\n",
    "\n",
-    "dotenv.load_dotenv()"
+    "if not os.environ.get(\"OPENAI_API_KEY\"):\n",
+    "    os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
   ]
  },
  {
@@ -71,13 +71,13 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
-    "chat = ChatOpenAI(model=\"gpt-4o-mini\")"
+    "model = ChatOpenAI(model=\"gpt-4o-mini\")"
   ]
  },
  {
@@ -98,34 +98,33 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "I said \"J'adore la programmation,\" which means \"I love programming\" in French.\n"
+      "I translated the sentence \"I love programming\" into French, which is \"J'adore la programmation.\"\n"
     ]
    }
   ],
   "source": [
-    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "from langchain_core.messages import AIMessage, HumanMessage, SystemMessage\n",
+    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
    "\n",
    "prompt = ChatPromptTemplate.from_messages(\n",
    "    [\n",
-    "        (\n",
-    "            \"system\",\n",
-    "            \"You are a helpful assistant. Answer all questions to the best of your ability.\",\n",
+    "        SystemMessage(\n",
+    "            content=\"You are a helpful assistant. Answer all questions to the best of your ability.\"\n",
    "        ),\n",
-    "        (\"placeholder\", \"{messages}\"),\n",
+    "        MessagesPlaceholder(variable_name=\"messages\"),\n",
    "    ]\n",
    ")\n",
    "\n",
-    "chain = prompt | chat\n",
+    "chain = prompt | model\n",
    "\n",
    "ai_msg = chain.invoke(\n",
    "    {\n",
    "        \"messages\": [\n",
-    "            (\n",
-    "                \"human\",\n",
-    "                \"Translate this sentence from English to French: I love programming.\",\n",
+    "            HumanMessage(\n",
+    "                content=\"Translate this sentence from English to French: I love programming.\"\n",
    "            ),\n",
-    "            (\"ai\", \"J'adore la programmation.\"),\n",
-    "            (\"human\", \"What did you just say?\"),\n",
+    "            AIMessage(content=\"J'adore la programmation.\"),\n",
+    "            HumanMessage(content=\"What did you just say?\"),\n",
    "        ],\n",
    "    }\n",
    ")\n",
@@ -136,51 +135,57 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can see that by passing the previous conversation into a chain, it can use it as context to answer questions. This is the basic concept underpinning chatbot memory - the rest of the guide will demonstrate convenient techniques for passing or reformatting messages.\n",
-    "\n",
-    "## Chat history\n",
-    "\n",
-    "It's perfectly fine to store and pass messages directly as an array, but we can use LangChain's built-in [message history class](https://python.langchain.com/api_reference/langchain/index.html#module-langchain.memory) to store and load messages as well. Instances of this class are responsible for storing and loading chat messages from persistent storage. LangChain integrates with many providers - you can see a [list of integrations here](/docs/integrations/memory) - but for this demo we will use an ephemeral demo class.\n",
-    "\n",
-    "Here's an example of the API:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[HumanMessage(content='Translate this sentence from English to French: I love programming.'),\n",
-       " AIMessage(content=\"J'adore la programmation.\")]"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from langchain_community.chat_message_histories import ChatMessageHistory\n",
-    "\n",
-    "demo_ephemeral_chat_history = ChatMessageHistory()\n",
-    "\n",
-    "demo_ephemeral_chat_history.add_user_message(\n",
-    "    \"Translate this sentence from English to French: I love programming.\"\n",
-    ")\n",
-    "\n",
-    "demo_ephemeral_chat_history.add_ai_message(\"J'adore la programmation.\")\n",
-    "\n",
-    "demo_ephemeral_chat_history.messages"
+    "We can see that by passing the previous conversation into a chain, it can use it as context to answer questions. This is the basic concept underpinning chatbot memory - the rest of the guide will demonstrate convenient techniques for passing or reformatting messages."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can use it directly to store conversation turns for our chain:"
+    "## Automatic history management\n",
+    "\n",
+    "The previous examples pass messages to the chain (and model) explicitly. This is a completely acceptable approach, but it does require external management of new messages. LangChain also provides a way to build applications that have memory using LangGraph's [persistence](https://langchain-ai.github.io/langgraph/concepts/persistence/). You can [enable persistence](https://langchain-ai.github.io/langgraph/how-tos/persistence/) in LangGraph applications by providing a `checkpointer` when compiling the graph."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langgraph.checkpoint.memory import MemorySaver\n",
+    "from langgraph.graph import START, MessagesState, StateGraph\n",
+    "\n",
+    "workflow = StateGraph(state_schema=MessagesState)\n",
+    "\n",
+    "\n",
+    "# Define the function that calls the model\n",
+    "def call_model(state: MessagesState):\n",
+    "    system_prompt = (\n",
+    "        \"You are a helpful assistant. \"\n",
+    "        \"Answer all questions to the best of your ability.\"\n",
+    "    )\n",
+    "    messages = [SystemMessage(content=system_prompt)] + state[\"messages\"]\n",
+    "    response = model.invoke(messages)\n",
+    "    return {\"messages\": response}\n",
+    "\n",
+    "\n",
+    "# Define the node and edge\n",
+    "workflow.add_node(\"model\", call_model)\n",
+    "workflow.add_edge(START, \"model\")\n",
+    "\n",
+    "# Add simple in-memory checkpointer\n",
+    "# highlight-start\n",
+    "memory = MemorySaver()\n",
+    "app = workflow.compile(checkpointer=memory)\n",
+    "# highlight-end"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " We'll pass the latest input to the conversation here and let the LangGraph keep track of the conversation history using the checkpointer:"
   ]
  },
  {
@@ -191,7 +196,8 @@
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='You just asked me to translate the sentence \"I love programming\" from English to French.', response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 61, 'total_tokens': 79}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-5cbb21c2-9c30-4031-8ea8-bfc497989535-0', usage_metadata={'input_tokens': 61, 'output_tokens': 18, 'total_tokens': 79})"
+       "{'messages': [HumanMessage(content='Translate this sentence from English to French: I love programming.', additional_kwargs={}, response_metadata={}, id='200f88bb-936a-4877-990c-8b4112d82cfe'),\n",
+       "  AIMessage(content='The translation of \"I love programming\" in French is \"J\\'aime programmer.\"', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 39, 'total_tokens': 55, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-d4ebcdcf-9a60-4471-ad8d-96169f614ada-0', usage_metadata={'input_tokens': 39, 'output_tokens': 16, 'total_tokens': 55})]}"
      ]
     },
     "execution_count": 5,
@@ -200,159 +206,35 @@
    }
   ],
   "source": [
-    "demo_ephemeral_chat_history = ChatMessageHistory()\n",
-    "\n",
-    "input1 = \"Translate this sentence from English to French: I love programming.\"\n",
-    "\n",
-    "demo_ephemeral_chat_history.add_user_message(input1)\n",
-    "\n",
-    "response = chain.invoke(\n",
-    "    {\n",
-    "        \"messages\": demo_ephemeral_chat_history.messages,\n",
-    "    }\n",
-    ")\n",
-    "\n",
-    "demo_ephemeral_chat_history.add_ai_message(response)\n",
-    "\n",
-    "input2 = \"What did I just ask you?\"\n",
-    "\n",
-    "demo_ephemeral_chat_history.add_user_message(input2)\n",
-    "\n",
-    "chain.invoke(\n",
-    "    {\n",
-    "        \"messages\": demo_ephemeral_chat_history.messages,\n",
-    "    }\n",
+    "app.invoke(\n",
+    "    {\"messages\": [HumanMessage(content=\"Translate to French: I love programming.\")]},\n",
+    "    config={\"configurable\": {\"thread_id\": \"1\"}},\n",
    ")"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Automatic history management\n",
-    "\n",
-    "The previous examples pass messages to the chain explicitly. This is a completely acceptable approach, but it does require external management of new messages. LangChain also includes an wrapper for LCEL chains that can handle this process automatically called `RunnableWithMessageHistory`.\n",
-    "\n",
-    "To show how it works, let's slightly modify the above prompt to take a final `input` variable that populates a `HumanMessage` template after the chat history. This means that we will expect a `chat_history` parameter that contains all messages BEFORE the current messages instead of all messages:"
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
-   "outputs": [],
-   "source": [
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        (\n",
-    "            \"system\",\n",
-    "            \"You are a helpful assistant. Answer all questions to the best of your ability.\",\n",
-    "        ),\n",
-    "        (\"placeholder\", \"{chat_history}\"),\n",
-    "        (\"human\", \"{input}\"),\n",
-    "    ]\n",
-    ")\n",
-    "\n",
-    "chain = prompt | chat"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    " We'll pass the latest input to the conversation here and let the `RunnableWithMessageHistory` class wrap our chain and do the work of appending that `input` variable to the chat history.\n",
-    " \n",
-    " Next, let's declare our wrapped chain:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
-    "\n",
-    "demo_ephemeral_chat_history_for_chain = ChatMessageHistory()\n",
-    "\n",
-    "chain_with_message_history = RunnableWithMessageHistory(\n",
-    "    chain,\n",
-    "    lambda session_id: demo_ephemeral_chat_history_for_chain,\n",
-    "    input_messages_key=\"input\",\n",
-    "    history_messages_key=\"chat_history\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "This class takes a few parameters in addition to the chain that we want to wrap:\n",
-    "\n",
-    "- A factory function that returns a message history for a given session id. This allows your chain to handle multiple users at once by loading different messages for different conversations.\n",
-    "- An `input_messages_key` that specifies which part of the input should be tracked and stored in the chat history. In this example, we want to track the string passed in as `input`.\n",
-    "- A `history_messages_key` that specifies what the previous messages should be injected into the prompt as. Our prompt has a `MessagesPlaceholder` named `chat_history`, so we specify this property to match.\n",
-    "- (For chains with multiple outputs) an `output_messages_key` which specifies which output to store as history. This is the inverse of `input_messages_key`.\n",
-    "\n",
-    "We can invoke this new chain as normal, with an additional `configurable` field that specifies the particular `session_id` to pass to the factory function. This is unused for the demo, but in real-world chains, you'll want to return a chat history corresponding to the passed session:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Parent run dc4e2f79-4bcd-4a36-9506-55ace9040588 not found for run 34b5773e-3ced-46a6-8daf-4d464c15c940. Treating as a root run.\n"
-     ]
-    },
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='\"J\\'adore la programmation.\"', response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 39, 'total_tokens': 48}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-648b0822-b0bb-47a2-8e7d-7d34744be8f2-0', usage_metadata={'input_tokens': 39, 'output_tokens': 9, 'total_tokens': 48})"
+       "{'messages': [HumanMessage(content='Translate this sentence from English to French: I love programming.', additional_kwargs={}, response_metadata={}, id='200f88bb-936a-4877-990c-8b4112d82cfe'),\n",
+       "  AIMessage(content='The translation of \"I love programming\" in French is \"J\\'aime programmer.\"', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 39, 'total_tokens': 55, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-d4ebcdcf-9a60-4471-ad8d-96169f614ada-0', usage_metadata={'input_tokens': 39, 'output_tokens': 16, 'total_tokens': 55}),\n",
+       "  HumanMessage(content='What did I just ask you?', additional_kwargs={}, response_metadata={}, id='df32f0a6-38fe-418a-98fe-7a5f17d0b812'),\n",
+       "  AIMessage(content='You asked me to translate the sentence \"I love programming\" from English to French.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 70, 'total_tokens': 87, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-1ee8ad67-d7f0-4bb9-adff-e632be6e2825-0', usage_metadata={'input_tokens': 70, 'output_tokens': 17, 'total_tokens': 87})]}"
      ]
     },
-     "execution_count": 8,
+     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "chain_with_message_history.invoke(\n",
-    "    {\"input\": \"Translate this sentence from English to French: I love programming.\"},\n",
-    "    {\"configurable\": {\"session_id\": \"unused\"}},\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Parent run cc14b9d8-c59e-40db-a523-d6ab3fc2fa4f not found for run 5b75e25c-131e-46ee-9982-68569db04330. Treating as a root run.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='You asked me to translate the sentence \"I love programming\" from English to French.', response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 63, 'total_tokens': 80}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-5950435c-1dc2-43a6-836f-f989fd62c95e-0', usage_metadata={'input_tokens': 63, 'output_tokens': 17, 'total_tokens': 80})"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chain_with_message_history.invoke(\n",
-    "    {\"input\": \"What did I just ask you?\"}, {\"configurable\": {\"session_id\": \"unused\"}}\n",
+    "app.invoke(\n",
+    "    {\"messages\": [HumanMessage(content=\"What did I just ask you?\")]},\n",
+    "    config={\"configurable\": {\"thread_id\": \"1\"}},\n",
    ")"
   ]
  },
@@ -366,80 +248,44 @@
    "\n",
    "### Trimming messages\n",
    "\n",
-    "LLMs and chat models have limited context windows, and even if you're not directly hitting limits, you may want to limit the amount of distraction the model has to deal with. One solution is trim the historic messages before passing them to the model. Let's use an example history with some preloaded messages:"
+    "LLMs and chat models have limited context windows, and even if you're not directly hitting limits, you may want to limit the amount of distraction the model has to deal with. One solution is trim the history messages before passing them to the model. Let's use an example history with the `app` we declared above:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[HumanMessage(content=\"Hey there! I'm Nemo.\"),\n",
-       " AIMessage(content='Hello!'),\n",
-       " HumanMessage(content='How are you today?'),\n",
-       " AIMessage(content='Fine thanks!')]"
+       "{'messages': [HumanMessage(content=\"Hey there! I'm Nemo.\", additional_kwargs={}, response_metadata={}, id='99321048-3390-4da6-919b-4ad933c4913b'),\n",
+       "  AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='1c3eaf4a-b698-4bc6-a7a6-549290c3fc7e'),\n",
+       "  HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='6f96db9d-ac30-4b4a-9ebc-bc11ae87646b'),\n",
+       "  AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='e783fbb6-2892-42ea-9859-ae449e4cfdf6'),\n",
+       "  HumanMessage(content=\"What's my name?\", additional_kwargs={}, response_metadata={}, id='854065c4-09a0-4c2a-9f2c-eb7182dcc9d5'),\n",
+       "  AIMessage(content='Your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 63, 'total_tokens': 68, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-eed15b83-b215-47a3-b374-404d6a05ab94-0', usage_metadata={'input_tokens': 63, 'output_tokens': 5, 'total_tokens': 68})]}"
      ]
     },
-     "execution_count": 21,
+     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "demo_ephemeral_chat_history = ChatMessageHistory()\n",
+    "demo_ephemeral_chat_history = [\n",
+    "    HumanMessage(content=\"Hey there! I'm Nemo.\"),\n",
+    "    AIMessage(content=\"Hello!\"),\n",
+    "    HumanMessage(content=\"How are you today?\"),\n",
+    "    AIMessage(content=\"Fine thanks!\"),\n",
+    "]\n",
    "\n",
-    "demo_ephemeral_chat_history.add_user_message(\"Hey there! I'm Nemo.\")\n",
-    "demo_ephemeral_chat_history.add_ai_message(\"Hello!\")\n",
-    "demo_ephemeral_chat_history.add_user_message(\"How are you today?\")\n",
-    "demo_ephemeral_chat_history.add_ai_message(\"Fine thanks!\")\n",
-    "\n",
-    "demo_ephemeral_chat_history.messages"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's use this message history with the `RunnableWithMessageHistory` chain we declared above:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Parent run 7ff2d8ec-65e2-4f67-8961-e498e2c4a591 not found for run 3881e990-6596-4326-84f6-2b76949e0657. Treating as a root run.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='Your name is Nemo.', response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 66, 'total_tokens': 72}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-f8aabef8-631a-4238-a39b-701e881fbe47-0', usage_metadata={'input_tokens': 66, 'output_tokens': 6, 'total_tokens': 72})"
-      ]
-     },
-     "execution_count": 22,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chain_with_message_history = RunnableWithMessageHistory(\n",
-    "    chain,\n",
-    "    lambda session_id: demo_ephemeral_chat_history,\n",
-    "    input_messages_key=\"input\",\n",
-    "    history_messages_key=\"chat_history\",\n",
-    ")\n",
-    "\n",
-    "chain_with_message_history.invoke(\n",
-    "    {\"input\": \"What's my name?\"},\n",
-    "    {\"configurable\": {\"session_id\": \"unused\"}},\n",
+    "app.invoke(\n",
+    "    {\n",
+    "        \"messages\": demo_ephemeral_chat_history\n",
+    "        + [HumanMessage(content=\"What's my name?\")]\n",
+    "    },\n",
+    "    config={\"configurable\": {\"thread_id\": \"2\"}},\n",
    ")"
   ]
  },
@@ -447,35 +293,88 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can see the chain remembers the preloaded name.\n",
+    "We can see the app remembers the preloaded name.\n",
    "\n",
-    "But let's say we have a very small context window, and we want to trim the number of messages passed to the chain to only the 2 most recent ones. We can use the built in [trim_messages](/docs/how_to/trim_messages/) util to trim messages based on their token count before they reach our prompt. In this case we'll count each message as 1 \"token\" and keep only the last two messages:"
+    "But let's say we have a very small context window, and we want to trim the number of messages passed to the model to only the 2 most recent ones. We can use the built in [trim_messages](/docs/how_to/trim_messages/) util to trim messages based on their token count before they reach our prompt. In this case we'll count each message as 1 \"token\" and keep only the last two messages:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
-    "from operator import itemgetter\n",
-    "\n",
    "from langchain_core.messages import trim_messages\n",
-    "from langchain_core.runnables import RunnablePassthrough\n",
+    "from langgraph.checkpoint.memory import MemorySaver\n",
+    "from langgraph.graph import START, MessagesState, StateGraph\n",
    "\n",
+    "# Define trimmer\n",
+    "# highlight-start\n",
+    "# count each message as 1 \"token\" (token_counter=len) and keep only the last two messages\n",
    "trimmer = trim_messages(strategy=\"last\", max_tokens=2, token_counter=len)\n",
+    "# highlight-end\n",
    "\n",
-    "chain_with_trimming = (\n",
-    "    RunnablePassthrough.assign(chat_history=itemgetter(\"chat_history\") | trimmer)\n",
-    "    | prompt\n",
-    "    | chat\n",
-    ")\n",
+    "workflow = StateGraph(state_schema=MessagesState)\n",
    "\n",
-    "chain_with_trimmed_history = RunnableWithMessageHistory(\n",
-    "    chain_with_trimming,\n",
-    "    lambda session_id: demo_ephemeral_chat_history,\n",
-    "    input_messages_key=\"input\",\n",
-    "    history_messages_key=\"chat_history\",\n",
+    "\n",
+    "# Define the function that calls the model\n",
+    "def call_model(state: MessagesState):\n",
+    "    # highlight-start\n",
+    "    trimmed_messages = trimmer.invoke(state[\"messages\"])\n",
+    "    system_prompt = (\n",
+    "        \"You are a helpful assistant. \"\n",
+    "        \"Answer all questions to the best of your ability.\"\n",
+    "    )\n",
+    "    messages = [SystemMessage(content=system_prompt)] + trimmed_messages\n",
+    "    # highlight-end\n",
+    "    response = model.invoke(messages)\n",
+    "    return {\"messages\": response}\n",
+    "\n",
+    "\n",
+    "# Define the node and edge\n",
+    "workflow.add_node(\"model\", call_model)\n",
+    "workflow.add_edge(START, \"model\")\n",
+    "\n",
+    "# Add simple in-memory checkpointer\n",
+    "memory = MemorySaver()\n",
+    "app = workflow.compile(checkpointer=memory)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's call this new app and check the response"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'messages': [HumanMessage(content=\"Hey there! I'm Nemo.\", additional_kwargs={}, response_metadata={}, id='99321048-3390-4da6-919b-4ad933c4913b'),\n",
+       "  AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='1c3eaf4a-b698-4bc6-a7a6-549290c3fc7e'),\n",
+       "  HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='6f96db9d-ac30-4b4a-9ebc-bc11ae87646b'),\n",
+       "  AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='e783fbb6-2892-42ea-9859-ae449e4cfdf6'),\n",
+       "  HumanMessage(content='What is my name?', additional_kwargs={}, response_metadata={}, id='c8ba5e90-89cb-4b34-ad4c-11c0478422d8'),\n",
+       "  AIMessage(content=\"I'm sorry, but I don't know your name. How can I assist you today?\", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 39, 'total_tokens': 56, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-aa86d3f8-898e-4146-aa3c-2c424934b0f5-0', usage_metadata={'input_tokens': 39, 'output_tokens': 17, 'total_tokens': 56})]}"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "app.invoke(\n",
+    "    {\n",
+    "        \"messages\": demo_ephemeral_chat_history\n",
+    "        + [HumanMessage(content=\"What is my name?\")]\n",
+    "    },\n",
+    "    config={\"configurable\": {\"thread_id\": \"3\"}},\n",
    ")"
   ]
  },
@@ -483,101 +382,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's call this new chain and check the messages afterwards:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Parent run 775cde65-8d22-4c44-80bb-f0b9811c32ca not found for run 5cf71d0e-4663-41cd-8dbe-e9752689cfac. Treating as a root run.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='P. Sherman is a fictional character from the animated movie \"Finding Nemo\" who lives at 42 Wallaby Way, Sydney.', response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 53, 'total_tokens': 80}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-5642ef3a-fdbe-43cf-a575-d1785976a1b9-0', usage_metadata={'input_tokens': 53, 'output_tokens': 27, 'total_tokens': 80})"
-      ]
-     },
-     "execution_count": 24,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chain_with_trimmed_history.invoke(\n",
-    "    {\"input\": \"Where does P. Sherman live?\"},\n",
-    "    {\"configurable\": {\"session_id\": \"unused\"}},\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[HumanMessage(content=\"Hey there! I'm Nemo.\"),\n",
-       " AIMessage(content='Hello!'),\n",
-       " HumanMessage(content='How are you today?'),\n",
-       " AIMessage(content='Fine thanks!'),\n",
-       " HumanMessage(content=\"What's my name?\"),\n",
-       " AIMessage(content='Your name is Nemo.', response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 66, 'total_tokens': 72}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-f8aabef8-631a-4238-a39b-701e881fbe47-0', usage_metadata={'input_tokens': 66, 'output_tokens': 6, 'total_tokens': 72}),\n",
-       " HumanMessage(content='Where does P. Sherman live?'),\n",
-       " AIMessage(content='P. Sherman is a fictional character from the animated movie \"Finding Nemo\" who lives at 42 Wallaby Way, Sydney.', response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 53, 'total_tokens': 80}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-5642ef3a-fdbe-43cf-a575-d1785976a1b9-0', usage_metadata={'input_tokens': 53, 'output_tokens': 27, 'total_tokens': 80})]"
-      ]
-     },
-     "execution_count": 25,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "demo_ephemeral_chat_history.messages"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "And we can see that our history has removed the two oldest messages while still adding the most recent conversation at the end. The next time the chain is called, `trim_messages` will be called again, and only the two most recent messages will be passed to the model. In this case, this means that the model will forget the name we gave it the next time we invoke it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Parent run fde7123f-6fd3-421a-a3fc-2fb37dead119 not found for run 061a4563-2394-470d-a3ed-9bf1388ca431. Treating as a root run.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content=\"I'm sorry, but I don't have access to your personal information, so I don't know your name. How else may I assist you today?\", response_metadata={'token_usage': {'completion_tokens': 31, 'prompt_tokens': 74, 'total_tokens': 105}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-0ab03495-1f7c-4151-9070-56d2d1c565ff-0', usage_metadata={'input_tokens': 74, 'output_tokens': 31, 'total_tokens': 105})"
-      ]
-     },
-     "execution_count": 27,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chain_with_trimmed_history.invoke(\n",
-    "    {\"input\": \"What is my name?\"},\n",
-    "    {\"configurable\": {\"session_id\": \"unused\"}},\n",
-    ")"
+    "We can see that `trim_messages` was called and only the two most recent messages will be passed to the model. In this case, this means that the model forgot the name we gave it."
   ]
  },
  {
@@ -593,114 +398,82 @@
   "source": [
    "### Summary memory\n",
    "\n",
-    "We can use this same pattern in other ways too. For example, we could use an additional LLM call to generate a summary of the conversation before calling our chain. Let's recreate our chat history and chatbot chain:"
+    "We can use this same pattern in other ways too. For example, we could use an additional LLM call to generate a summary of the conversation before calling our app. Let's recreate our chat history:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 10,
   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[HumanMessage(content=\"Hey there! I'm Nemo.\"),\n",
-       " AIMessage(content='Hello!'),\n",
-       " HumanMessage(content='How are you today?'),\n",
-       " AIMessage(content='Fine thanks!')]"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
   "source": [
-    "demo_ephemeral_chat_history = ChatMessageHistory()\n",
-    "\n",
-    "demo_ephemeral_chat_history.add_user_message(\"Hey there! I'm Nemo.\")\n",
-    "demo_ephemeral_chat_history.add_ai_message(\"Hello!\")\n",
-    "demo_ephemeral_chat_history.add_user_message(\"How are you today?\")\n",
-    "demo_ephemeral_chat_history.add_ai_message(\"Fine thanks!\")\n",
-    "\n",
-    "demo_ephemeral_chat_history.messages"
+    "demo_ephemeral_chat_history = [\n",
+    "    HumanMessage(content=\"Hey there! I'm Nemo.\"),\n",
+    "    AIMessage(content=\"Hello!\"),\n",
+    "    HumanMessage(content=\"How are you today?\"),\n",
+    "    AIMessage(content=\"Fine thanks!\"),\n",
+    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We'll slightly modify the prompt to make the LLM aware that will receive a condensed summary instead of a chat history:"
+    "And now, let's update the model-calling function to distill previous interactions into a summary:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        (\n",
-    "            \"system\",\n",
-    "            \"You are a helpful assistant. Answer all questions to the best of your ability. The provided chat history includes facts about the user you are speaking with.\",\n",
-    "        ),\n",
-    "        (\"placeholder\", \"{chat_history}\"),\n",
-    "        (\"user\", \"{input}\"),\n",
-    "    ]\n",
-    ")\n",
+    "from langchain_core.messages import HumanMessage, RemoveMessage\n",
+    "from langgraph.checkpoint.memory import MemorySaver\n",
+    "from langgraph.graph import START, MessagesState, StateGraph\n",
    "\n",
-    "chain = prompt | chat\n",
+    "workflow = StateGraph(state_schema=MessagesState)\n",
    "\n",
-    "chain_with_message_history = RunnableWithMessageHistory(\n",
-    "    chain,\n",
-    "    lambda session_id: demo_ephemeral_chat_history,\n",
-    "    input_messages_key=\"input\",\n",
-    "    history_messages_key=\"chat_history\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "And now, let's create a function that will distill previous interactions into a summary. We can add this one to the front of the chain too:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def summarize_messages(chain_input):\n",
-    "    stored_messages = demo_ephemeral_chat_history.messages\n",
-    "    if len(stored_messages) == 0:\n",
-    "        return False\n",
-    "    summarization_prompt = ChatPromptTemplate.from_messages(\n",
-    "        [\n",
-    "            (\"placeholder\", \"{chat_history}\"),\n",
-    "            (\n",
-    "                \"user\",\n",
-    "                \"Distill the above chat messages into a single summary message. Include as many specific details as you can.\",\n",
-    "            ),\n",
-    "        ]\n",
+    "\n",
+    "# Define the function that calls the model\n",
+    "def call_model(state: MessagesState):\n",
+    "    system_prompt = (\n",
+    "        \"You are a helpful assistant. \"\n",
+    "        \"Answer all questions to the best of your ability. \"\n",
+    "        \"The provided chat history includes a summary of the earlier conversation.\"\n",
    "    )\n",
-    "    summarization_chain = summarization_prompt | chat\n",
+    "    system_message = SystemMessage(content=system_prompt)\n",
+    "    # Summarize the messages\n",
+    "    if len(state[\"messages\"]) > 1:\n",
+    "        *message_history, last_human_message = state[\"messages\"]\n",
+    "        # Invoke the model to generate conversation summary\n",
+    "        summary_prompt = (\n",
+    "            \"Distill the above chat messages into a single summary message. \"\n",
+    "            \"Include as many specific details as you can.\"\n",
+    "        )\n",
+    "        summary_message = model.invoke(\n",
+    "            message_history + [HumanMessage(content=summary_prompt)]\n",
+    "        )\n",
+    "        # Delete messages that we no longer want to show up\n",
+    "        delete_messages = [RemoveMessage(id=m.id) for m in state[\"messages\"]]\n",
+    "        # Re-add user message\n",
+    "        human_message = HumanMessage(content=last_human_message.content)\n",
+    "        # Call the model with summary & response\n",
+    "        response = model.invoke([system_message, summary_message, human_message])\n",
+    "        message_updates = [summary_message, human_message, response] + delete_messages\n",
+    "    else:\n",
+    "        message_updates = model.invoke([system_message] + state[\"messages\"])\n",
    "\n",
-    "    summary_message = summarization_chain.invoke({\"chat_history\": stored_messages})\n",
-    "\n",
-    "    demo_ephemeral_chat_history.clear()\n",
-    "\n",
-    "    demo_ephemeral_chat_history.add_message(summary_message)\n",
-    "\n",
-    "    return True\n",
+    "    return {\"messages\": message_updates}\n",
    "\n",
    "\n",
-    "chain_with_summarization = (\n",
-    "    RunnablePassthrough.assign(messages_summarized=summarize_messages)\n",
-    "    | chain_with_message_history\n",
-    ")"
+    "# Define the node and edge\n",
+    "workflow.add_node(\"model\", call_model)\n",
+    "workflow.add_edge(START, \"model\")\n",
+    "\n",
+    "# Add simple in-memory checkpointer\n",
+    "memory = MemorySaver()\n",
+    "app = workflow.compile(checkpointer=memory)"
   ]
  },
  {
@@ -712,54 +485,37 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='You introduced yourself as Nemo. How can I assist you today, Nemo?')"
+       "{'messages': [AIMessage(content='Nemo greeted me, and I responded positively, indicating that I am doing fine.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 60, 'total_tokens': 77, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-94df0e9f-6b1c-4e68-858c-5b23058b16d8-0', usage_metadata={'input_tokens': 60, 'output_tokens': 17, 'total_tokens': 77}),\n",
+       "  HumanMessage(content='What did I say my name was?', additional_kwargs={}, response_metadata={}, id='d3f57f56-dd1a-45f9-add2-146f54c1180c'),\n",
+       "  AIMessage(content='You mentioned that your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 68, 'total_tokens': 76, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-ea144209-5d37-4bb5-8529-be235626fc74-0', usage_metadata={'input_tokens': 68, 'output_tokens': 8, 'total_tokens': 76})]}"
      ]
     },
-     "execution_count": 20,
+     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "chain_with_summarization.invoke(\n",
-    "    {\"input\": \"What did I say my name was?\"},\n",
-    "    {\"configurable\": {\"session_id\": \"unused\"}},\n",
+    "app.invoke(\n",
+    "    {\n",
+    "        \"messages\": demo_ephemeral_chat_history\n",
+    "        + [HumanMessage(\"What did I say my name was?\")]\n",
+    "    },\n",
+    "    config={\"configurable\": {\"thread_id\": \"4\"}},\n",
    ")"
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[AIMessage(content='The conversation is between Nemo and an AI. Nemo introduces himself and the AI responds with a greeting. Nemo then asks the AI how it is doing, and the AI responds that it is fine.'),\n",
-       " HumanMessage(content='What did I say my name was?'),\n",
-       " AIMessage(content='You introduced yourself as Nemo. How can I assist you today, Nemo?')]"
-      ]
-     },
-     "execution_count": 21,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "demo_ephemeral_chat_history.messages"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Note that invoking the chain again will generate another summary generated from the initial summary plus new messages and so on. You could also design a hybrid approach where a certain number of messages are retained in chat history while others are summarized."
+    "Note that invoking the app again will generate another summary generated from the initial summary plus new messages and so on. You could also design a hybrid approach where a certain number of messages are retained in chat history while others are summarized."
   ]
  }
 ],
@@ -779,7 +535,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.9"
+   "version": "3.12.3"
  }
 },
 "nbformat": 4,
--- a/docs/docs/how_to/chatbots_tools.ipynb
+++ b/docs/docs/how_to/chatbots_tools.ipynb
--- a/docs/docs/how_to/langgraph_persistence.md
+++ b/docs/docs/how_to/langgraph_persistence.md
@@ -0,0 +1,31 @@
+# How to upgrade to LangGraph persistence
+
+As of the v0.3 release of LangChain, we recommend that LangChain users take advantage of [LangGraph persistence](https://langchain-ai.github.io/langgraph/concepts/persistence/) to incorporate `memory` into their LangChain application.
+
+## Evolution of memory in LangChain 
+
+The concept of memory has evolved significantly in LangChain since its initial release.
+
+In LangChain 0.0.x, memory was based on the [BaseMemory](https://api.python.langchain.com/en/latest/memory/langchain_core.memory.BaseMemory.html) interface and the [BaseChatMessageHistory](https://api.python.langchain.com/en/latest/history/langchain_core.runnables.history.BaseChatMessageHistory.html) interface.
+
+There were number of useful [memory implementations](https://python.langchain.com/api_reference/langchain/memory.html) based
+on the `BaseMemory` interface (e.g.[ConversationBufferMemory](https://python.langchain.com/api_reference/langchain/memory/langchain.memory.buffer.ConversationBufferMemory.html), [ConversationBufferWindowMemory](https://python.langchain.com/api_reference/langchain/memory/langchain.memory.buffer_window.ConversationBufferWindowMemory.html)); however, these lacked built-in support for multi-user, multi-conversation scenarios, which are essential for practical conversational AI systems.
+
+:::note
+If you are relying on any deprecated memory abstractions in LangChain 0.0.x, we recommend that you follow
+the given steps to upgrade to the new LangGraph persistence feature in LangChain 0.3.x.
+https://python.langchain.com/docs/versions/migrating_memory/
+:::
+
+As of LangChain v0.1, we started recommending that users rely primarily on [BaseChatMessageHistory](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html#langchain_core.runnables.history.RunnableWithMessageHistory). `BaseChatMessageHistory` is a simple persistence layer for a chat history that can be used to store and retrieve messages in a conversation. At this time, the only option for orchestrating LangChain chains was via [LCEL](https://python.langchain.com/docs/how_to/#langchain-expression-language-lcel). When using `LCEL`, memory can be added using the [RunnableWithMessageHistory](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html#langchain_core.runnables.history.RunnableWithMessageHistory) interface. While this option is sufficient for building a simple chat application, many users found the API to be unintuitive and difficult to work with.
+
+As of LangChain v0.3, we are commending that new code rely on LangGraph for both orchestration and persistence.
+
+Specifically, for orchestration instead of writing `LCEL` code, users can define LangGraph [graphs](https://langchain-ai.github.io/langgraph/concepts/low_level/). This allows users to keep using `LCEL` within individual nodes when `LCEL` is needed, while
+making it easy to define complex orchestration logic that is more readable and maintainable.
+
+For persistence, users can use LangGraph's [persistence](https://langchain-ai.github.io/langgraph/concepts/persistence/) feature to store and retrieve data from a graph database. LangGraph persistence is extremely flexible and can support a much wider range of use cases than the `RunnableWithMessageHistory` interface.
+
+:::important
+If you have been using `RunnableWithMessageHistory` or `BaseChatMessageHistory`, you do not need to make any changes. We do not plan on deprecating either functionality in the near future. This functionality is sufficient for simple chat applications and any code that uses `RunnableWithMessageHistory` will continue to work as expected.
+:::
--- a/docs/docs/how_to/qa_chat_history_how_to.ipynb
+++ b/docs/docs/how_to/qa_chat_history_how_to.ipynb
@@ -7,6 +7,15 @@
   "source": [
    "# How to add chat history\n",
    "\n",
+    "\n",
+    ":::{.callout-note}\n",
+    "\n",
+    "This tutorial previously built a chatbot using [RunnableWithMessageHistory](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html). You can access this version of the tutorial in the [v0.2 docs](https://python.langchain.com/v0.2/docs/how_to/qa_chat_history_how_to/).\n",
+    "\n",
+    "The LangGraph implementation offers a number of advantages over `RunnableWithMessageHistory`, including the ability to persist arbitrary components of an application's state (instead of only messages).\n",
+    "\n",
+    ":::\n",
+    "\n",
    "In many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of \"memory\" of past questions and answers, and some logic for incorporating those into its current thinking.\n",
    "\n",
    "In this guide we focus on **adding logic for incorporating historical messages.**\n",
@@ -29,7 +38,7 @@
    "\n",
    "### Dependencies\n",
    "\n",
-    "We'll use OpenAI embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any [Embeddings](/docs/concepts#embedding-models), and [VectorStore](/docs/concepts#vectorstores) or [Retriever](/docs/concepts#retrievers). \n",
+    "We'll use OpenAI embeddings and an InMemory vector store in this walkthrough, but everything shown here works with any [Embeddings](/docs/concepts#embedding-models), and [VectorStore](/docs/concepts#vectorstores) or [Retriever](/docs/concepts#retrievers). \n",
    "\n",
    "We'll use the following packages:"
   ]
@@ -42,7 +51,7 @@
   "outputs": [],
   "source": [
    "%%capture --no-stderr\n",
-    "%pip install --upgrade --quiet  langchain langchain-community langchain-chroma beautifulsoup4"
+    "%pip install --upgrade --quiet  langchain langchain-community beautifulsoup4"
   ]
  },
  {
@@ -64,11 +73,7 @@
    "import os\n",
    "\n",
    "if not os.environ.get(\"OPENAI_API_KEY\"):\n",
-    "    os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
-    "\n",
-    "# import dotenv\n",
-    "\n",
-    "# dotenv.load_dotenv()"
+    "    os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()"
   ]
  },
  {
@@ -155,7 +160,7 @@
   "id": "15f8ad59-19de-42e3-85a8-3ba95ee0bd43",
   "metadata": {},
   "source": [
-    "For the retriever, we will use [WebBaseLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.web_base.WebBaseLoader.html) to load the content of a web page. Here we instantiate a `Chroma` vectorstore and then use its [.as_retriever](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.VectorStore.html#langchain_core.vectorstores.VectorStore.as_retriever) method to build a retriever that can be incorporated into [LCEL](/docs/concepts/#langchain-expression-language) chains."
+    "For the retriever, we will use [WebBaseLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.web_base.WebBaseLoader.html) to load the content of a web page. Here we instantiate a `InMemoryVectorStore` vectorstore and then use its [.as_retriever](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.VectorStore.html#langchain_core.vectorstores.VectorStore.as_retriever) method to build a retriever that can be incorporated into [LCEL](/docs/concepts/#langchain-expression-language) chains."
   ]
  },
  {
@@ -163,16 +168,24 @@
   "execution_count": 5,
   "id": "820244ae-74b4-4593-b392-822979dd91b8",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "USER_AGENT environment variable not set, consider setting it to identify your requests.\n"
+     ]
+    }
+   ],
   "source": [
    "import bs4\n",
    "from langchain.chains import create_retrieval_chain\n",
    "from langchain.chains.combine_documents import create_stuff_documents_chain\n",
-    "from langchain_chroma import Chroma\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
+    "from langchain_core.vectorstores import InMemoryVectorStore\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "\n",
@@ -188,7 +201,8 @@
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
    "splits = text_splitter.split_documents(docs)\n",
-    "vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
+    "vectorstore = InMemoryVectorStore(embedding=OpenAIEmbeddings())\n",
+    "vectorstore.add_documents(splits)\n",
    "retriever = vectorstore.as_retriever()"
   ]
  },
@@ -288,8 +302,8 @@
    "        (\"human\", \"{input}\"),\n",
    "    ]\n",
    ")\n",
-    "question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)\n",
    "\n",
+    "question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)\n",
    "rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)"
   ]
  },
@@ -298,20 +312,17 @@
   "id": "53a662c2-f38b-45f9-95c4-66de15637614",
   "metadata": {},
   "source": [
-    "### Adding chat history\n",
+    "### Stateful Management of chat history\n",
    "\n",
-    "To manage the chat history, we will need:\n",
+    "We have added application logic for incorporating chat history, but we are still manually plumbing it through our application. In production, the Q&A application we usually persist the chat history into a database, and be able to read and update it appropriately.\n",
    "\n",
-    "1. An object for storing the chat history;\n",
-    "2. An object that wraps our chain and manages updates to the chat history.\n",
+    "[LangGraph](https://langchain-ai.github.io/langgraph/) implements a built-in [persistence layer](https://langchain-ai.github.io/langgraph/concepts/persistence/), making it ideal for chat applications that support multiple conversational turns.\n",
    "\n",
-    "For these we will use [BaseChatMessageHistory](https://python.langchain.com/api_reference/core/chat_history/langchain_core.chat_history.BaseChatMessageHistory.html) and [RunnableWithMessageHistory](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html). The latter is a wrapper for an LCEL chain and a `BaseChatMessageHistory` that handles injecting chat history into inputs and updating it after each invocation.\n",
+    "Wrapping our chat model in a minimal LangGraph application allows us to automatically persist the message history, simplifying the development of multi-turn applications.\n",
    "\n",
-    "For a detailed walkthrough of how to use these classes together to create a stateful conversational chain, head to the [How to add message history (memory)](/docs/how_to/message_history/) LCEL how-to guide.\n",
+    "LangGraph comes with a simple [in-memory checkpointer](https://langchain-ai.github.io/langgraph/reference/checkpoints/#memorysaver), which we use below. See its documentation for more detail, including how to use different persistence backends (e.g., SQLite or Postgres).\n",
    "\n",
-    "Below, we implement a simple example of the second option, in which chat histories are stored in a simple dict. LangChain manages memory integrations with [Redis](/docs/integrations/memory/redis_chat_message_history/) and other technologies to provide for more robust persistence.\n",
-    "\n",
-    "Instances of `RunnableWithMessageHistory` manage the chat history for you. They accept a config with a key (`\"session_id\"` by default) that specifies what conversation history to fetch and prepend to the input, and append the output to the same conversation history. Below is an example:"
+    "For a detailed walkthrough of how to manage message history, head to the How to add message history (memory) guide."
   ]
  },
  {
@@ -321,26 +332,48 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from langchain_community.chat_message_histories import ChatMessageHistory\n",
-    "from langchain_core.chat_history import BaseChatMessageHistory\n",
-    "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
+    "from typing import Sequence\n",
    "\n",
-    "store = {}\n",
+    "from langchain_core.messages import AIMessage, BaseMessage, HumanMessage\n",
+    "from langgraph.checkpoint.memory import MemorySaver\n",
+    "from langgraph.graph import START, StateGraph\n",
+    "from langgraph.graph.message import add_messages\n",
+    "from typing_extensions import Annotated, TypedDict\n",
    "\n",
    "\n",
-    "def get_session_history(session_id: str) -> BaseChatMessageHistory:\n",
-    "    if session_id not in store:\n",
-    "        store[session_id] = ChatMessageHistory()\n",
-    "    return store[session_id]\n",
+    "# We define a dict representing the state of the application.\n",
+    "# This state has the same input and output keys as `rag_chain`.\n",
+    "class State(TypedDict):\n",
+    "    input: str\n",
+    "    chat_history: Annotated[Sequence[BaseMessage], add_messages]\n",
+    "    context: str\n",
+    "    answer: str\n",
    "\n",
    "\n",
-    "conversational_rag_chain = RunnableWithMessageHistory(\n",
-    "    rag_chain,\n",
-    "    get_session_history,\n",
-    "    input_messages_key=\"input\",\n",
-    "    history_messages_key=\"chat_history\",\n",
-    "    output_messages_key=\"answer\",\n",
-    ")"
+    "# We then define a simple node that runs the `rag_chain`.\n",
+    "# The `return` values of the node update the graph state, so here we just\n",
+    "# update the chat history with the input message and response.\n",
+    "def call_model(state: State):\n",
+    "    response = rag_chain.invoke(state)\n",
+    "    return {\n",
+    "        \"chat_history\": [\n",
+    "            HumanMessage(state[\"input\"]),\n",
+    "            AIMessage(response[\"answer\"]),\n",
+    "        ],\n",
+    "        \"context\": response[\"context\"],\n",
+    "        \"answer\": response[\"answer\"],\n",
+    "    }\n",
+    "\n",
+    "\n",
+    "# Our graph consists only of one node:\n",
+    "workflow = StateGraph(state_schema=State)\n",
+    "workflow.add_edge(START, \"model\")\n",
+    "workflow.add_node(\"model\", call_model)\n",
+    "\n",
+    "# Finally, we compile the graph with a checkpointer object.\n",
+    "# This persists the state, in this case in memory.\n",
+    "memory = MemorySaver()\n",
+    "app = workflow.compile(checkpointer=memory)"
   ]
  },
  {
@@ -350,23 +383,21 @@
   "metadata": {},
   "outputs": [
    {
-     "data": {
-      "text/plain": [
-       "'Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable and easier to accomplish. This process can be done using techniques like Chain of Thought (CoT) or Tree of Thoughts to guide the model in breaking down tasks effectively. Task decomposition can be facilitated by providing simple prompts to a language model, task-specific instructions, or human inputs.'"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. This process helps agents or models tackle difficult tasks by dividing them into more manageable subtasks. Task decomposition can be achieved through methods like Chain of Thought or Tree of Thoughts, which guide the model in thinking step by step or exploring multiple reasoning possibilities at each step.\n"
+     ]
    }
   ],
   "source": [
-    "conversational_rag_chain.invoke(\n",
+    "config = {\"configurable\": {\"thread_id\": \"abc123\"}}\n",
+    "\n",
+    "result = app.invoke(\n",
    "    {\"input\": \"What is Task Decomposition?\"},\n",
-    "    config={\n",
-    "        \"configurable\": {\"session_id\": \"abc123\"}\n",
-    "    },  # constructs a key \"abc123\" in `store`.\n",
-    ")[\"answer\"]"
+    "    config=config,\n",
+    ")\n",
+    "print(result[\"answer\"])"
   ]
  },
  {
@@ -376,21 +407,19 @@
   "metadata": {},
   "outputs": [
    {
-     "data": {
-      "text/plain": [
-       "'Task decomposition can be achieved through various methods, including using techniques like Chain of Thought (CoT) or Tree of Thoughts to guide the model in breaking down tasks effectively. Common ways of task decomposition include providing simple prompts to a language model, task-specific instructions, or human inputs to break down complex tasks into smaller and more manageable steps. Additionally, task decomposition can involve utilizing resources like internet access for information gathering, long-term memory management, and GPT-3.5 powered agents for delegation of simple tasks.'"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "One way of task decomposition is by using a Language Model (LLM) with simple prompting, such as providing instructions like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\" This method guides the LLM to break down the task into smaller components for easier processing and execution.\n"
+     ]
    }
   ],
   "source": [
-    "conversational_rag_chain.invoke(\n",
-    "    {\"input\": \"What are common ways of doing it?\"},\n",
-    "    config={\"configurable\": {\"session_id\": \"abc123\"}},\n",
-    ")[\"answer\"]"
+    "result = app.invoke(\n",
+    "    {\"input\": \"What is one way of doing it?\"},\n",
+    "    config=config,\n",
+    ")\n",
+    "print(result[\"answer\"])"
   ]
  },
  {
@@ -398,7 +427,7 @@
   "id": "3ab59258-84bc-4904-880e-2ebfebbca563",
   "metadata": {},
   "source": [
-    "The conversation history can be inspected in the `store` dict:"
+    "The conversation history can be inspected via the state of the application:"
   ]
  },
  {
@@ -411,27 +440,25 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "User: What is Task Decomposition?\n",
+      "================================\u001b[1m Human Message \u001b[0m=================================\n",
      "\n",
-      "AI: Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable and easier to accomplish. This process can be done using techniques like Chain of Thought (CoT) or Tree of Thoughts to guide the model in breaking down tasks effectively. Task decomposition can be facilitated by providing simple prompts to a language model, task-specific instructions, or human inputs.\n",
+      "What is Task Decomposition?\n",
+      "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
      "\n",
-      "User: What are common ways of doing it?\n",
+      "Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. This process helps agents or models tackle difficult tasks by dividing them into more manageable subtasks. Task decomposition can be achieved through methods like Chain of Thought or Tree of Thoughts, which guide the model in thinking step by step or exploring multiple reasoning possibilities at each step.\n",
+      "================================\u001b[1m Human Message \u001b[0m=================================\n",
      "\n",
-      "AI: Task decomposition can be achieved through various methods, including using techniques like Chain of Thought (CoT) or Tree of Thoughts to guide the model in breaking down tasks effectively. Common ways of task decomposition include providing simple prompts to a language model, task-specific instructions, or human inputs to break down complex tasks into smaller and more manageable steps. Additionally, task decomposition can involve utilizing resources like internet access for information gathering, long-term memory management, and GPT-3.5 powered agents for delegation of simple tasks.\n",
-      "\n"
+      "What is one way of doing it?\n",
+      "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
+      "\n",
+      "One way of task decomposition is by using a Language Model (LLM) with simple prompting, such as providing instructions like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\" This method guides the LLM to break down the task into smaller components for easier processing and execution.\n"
     ]
    }
   ],
   "source": [
-    "from langchain_core.messages import AIMessage\n",
-    "\n",
-    "for message in store[\"abc123\"].messages:\n",
-    "    if isinstance(message, AIMessage):\n",
-    "        prefix = \"AI\"\n",
-    "    else:\n",
-    "        prefix = \"User\"\n",
-    "\n",
-    "    print(f\"{prefix}: {message.content}\\n\")"
+    "chat_history = app.get_state(config).values[\"chat_history\"]\n",
+    "for message in chat_history:\n",
+    "    message.pretty_print()"
   ]
  },
  {
@@ -459,17 +486,24 @@
   "metadata": {},
   "outputs": [],
   "source": [
+    "from typing import Sequence\n",
+    "\n",
    "import bs4\n",
    "from langchain.chains import create_history_aware_retriever, create_retrieval_chain\n",
    "from langchain.chains.combine_documents import create_stuff_documents_chain\n",
-    "from langchain_chroma import Chroma\n",
    "from langchain_community.chat_message_histories import ChatMessageHistory\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "from langchain_core.chat_history import BaseChatMessageHistory\n",
+    "from langchain_core.messages import AIMessage, BaseMessage, HumanMessage\n",
    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
    "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
+    "from langchain_core.vectorstores import InMemoryVectorStore\n",
    "from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
+    "from langgraph.checkpoint.memory import MemorySaver\n",
+    "from langgraph.graph import START, StateGraph\n",
+    "from langgraph.graph.message import add_messages\n",
+    "from typing_extensions import Annotated, TypedDict\n",
    "\n",
    "llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
    "\n",
@@ -487,7 +521,9 @@
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
    "splits = text_splitter.split_documents(docs)\n",
-    "vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
+    "\n",
+    "vectorstore = InMemoryVectorStore(embedding=OpenAIEmbeddings())\n",
+    "vectorstore.add_documents(documents=splits)\n",
    "retriever = vectorstore.as_retriever()\n",
    "\n",
    "\n",
@@ -534,22 +570,41 @@
    "\n",
    "\n",
    "### Statefully manage chat history ###\n",
-    "store = {}\n",
    "\n",
    "\n",
-    "def get_session_history(session_id: str) -> BaseChatMessageHistory:\n",
-    "    if session_id not in store:\n",
-    "        store[session_id] = ChatMessageHistory()\n",
-    "    return store[session_id]\n",
+    "# We define a dict representing the state of the application.\n",
+    "# This state has the same input and output keys as `rag_chain`.\n",
+    "class State(TypedDict):\n",
+    "    input: str\n",
+    "    chat_history: Annotated[Sequence[BaseMessage], add_messages]\n",
+    "    context: str\n",
+    "    answer: str\n",
    "\n",
    "\n",
-    "conversational_rag_chain = RunnableWithMessageHistory(\n",
-    "    rag_chain,\n",
-    "    get_session_history,\n",
-    "    input_messages_key=\"input\",\n",
-    "    history_messages_key=\"chat_history\",\n",
-    "    output_messages_key=\"answer\",\n",
-    ")"
+    "# We then define a simple node that runs the `rag_chain`.\n",
+    "# The `return` values of the node update the graph state, so here we just\n",
+    "# update the chat history with the input message and response.\n",
+    "def call_model(state: State):\n",
+    "    response = rag_chain.invoke(state)\n",
+    "    return {\n",
+    "        \"chat_history\": [\n",
+    "            HumanMessage(state[\"input\"]),\n",
+    "            AIMessage(response[\"answer\"]),\n",
+    "        ],\n",
+    "        \"context\": response[\"context\"],\n",
+    "        \"answer\": response[\"answer\"],\n",
+    "    }\n",
+    "\n",
+    "\n",
+    "# Our graph consists only of one node:\n",
+    "workflow = StateGraph(state_schema=State)\n",
+    "workflow.add_edge(START, \"model\")\n",
+    "workflow.add_node(\"model\", call_model)\n",
+    "\n",
+    "# Finally, we compile the graph with a checkpointer object.\n",
+    "# This persists the state, in this case in memory.\n",
+    "memory = MemorySaver()\n",
+    "app = workflow.compile(checkpointer=memory)"
   ]
  },
  {
@@ -559,23 +614,21 @@
   "metadata": {},
   "outputs": [
    {
-     "data": {
-      "text/plain": [
-       "'Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable. Techniques like Chain of Thought (CoT) and Tree of Thoughts help in decomposing hard tasks into multiple manageable tasks by instructing models to think step by step and explore multiple reasoning possibilities at each step. Task decomposition can be achieved through various methods such as using prompting techniques, task-specific instructions, or human inputs.'"
-      ]
-     },
-     "execution_count": 14,
-     "metadata": {},
-     "output_type": "execute_result"
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. This process helps agents or models tackle difficult tasks by dividing them into more manageable subtasks. Different methods like Chain of Thought and Tree of Thoughts are used to guide the decomposition process, enabling a step-by-step approach to problem-solving.\n"
+     ]
    }
   ],
   "source": [
-    "conversational_rag_chain.invoke(\n",
+    "config = {\"configurable\": {\"thread_id\": \"abc123\"}}\n",
+    "\n",
+    "result = app.invoke(\n",
    "    {\"input\": \"What is Task Decomposition?\"},\n",
-    "    config={\n",
-    "        \"configurable\": {\"session_id\": \"abc123\"}\n",
-    "    },  # constructs a key \"abc123\" in `store`.\n",
-    ")[\"answer\"]"
+    "    config=config,\n",
+    ")\n",
+    "print(result[\"answer\"])"
   ]
  },
  {
@@ -585,21 +638,19 @@
   "metadata": {},
   "outputs": [
    {
-     "data": {
-      "text/plain": [
-       "'Task decomposition can be done in common ways such as using prompting techniques like Chain of Thought (CoT) or Tree of Thoughts, which instruct models to think step by step and explore multiple reasoning possibilities at each step. Another way is to provide task-specific instructions, such as asking to \"Write a story outline\" for writing a novel, to guide the decomposition process. Additionally, task decomposition can also involve human inputs to break down complex tasks into smaller and simpler steps.'"
-      ]
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "One way of task decomposition is by using Large Language Models (LLMs) with simple prompting, such as providing instructions like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\" This method leverages the capabilities of LLMs to break down tasks into smaller components, making them easier to manage and solve.\n"
+     ]
    }
   ],
   "source": [
-    "conversational_rag_chain.invoke(\n",
-    "    {\"input\": \"What are common ways of doing it?\"},\n",
-    "    config={\"configurable\": {\"session_id\": \"abc123\"}},\n",
-    ")[\"answer\"]"
+    "result = app.invoke(\n",
+    "    {\"input\": \"What is one way of doing it?\"},\n",
+    "    config=config,\n",
+    ")\n",
+    "print(result[\"answer\"])"
   ]
  },
  {
@@ -672,22 +723,11 @@
   "id": "52ae46d9-43f7-481b-96d5-df750be3ad65",
   "metadata": {},
   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Error in LangChainTracer.on_tool_end callback: TracerException(\"Found chain run at ID 5cd28d13-88dd-4eac-a465-3770ac27eff6, but expected {'tool'} run.\")\n"
-     ]
-    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_TbhPPPN05GKi36HLeaN4QM90', 'function': {'arguments': '{\"query\":\"Task Decomposition\"}', 'name': 'blog_post_retriever'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 68, 'total_tokens': 87}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-2e60d910-879a-4a2a-b1e9-6a6c5c7d7ebc-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'Task Decomposition'}, 'id': 'call_TbhPPPN05GKi36HLeaN4QM90'}])]}}\n",
-      "----\n",
-      "{'tools': {'messages': [ToolMessage(content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\n\\nFig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\n\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\n\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', name='blog_post_retriever', tool_call_id='call_TbhPPPN05GKi36HLeaN4QM90')]}}\n",
-      "----\n",
-      "{'agent': {'messages': [AIMessage(content='Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. This approach helps in transforming big tasks into multiple manageable tasks, making it easier for autonomous agents to handle and interpret the thinking process. One common method for task decomposition is the Chain of Thought (CoT) technique, where models are instructed to \"think step by step\" to decompose hard tasks. Another extension of CoT is the Tree of Thoughts, which explores multiple reasoning possibilities at each step by creating a tree structure of multiple thoughts per step. Task decomposition can be facilitated through various methods such as using simple prompts, task-specific instructions, or human inputs.', response_metadata={'token_usage': {'completion_tokens': 130, 'prompt_tokens': 636, 'total_tokens': 766}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-3ef17638-65df-4030-a7fe-795e6da91c69-0')]}}\n",
+      "{'agent': {'messages': [AIMessage(content='Task decomposition is a problem-solving strategy that involves breaking down a complex task or problem into smaller, more manageable subtasks. By decomposing a task, individuals can better understand the components of the task, allocate resources effectively, and solve the problem more efficiently. This approach allows for a systematic and organized way of approaching complex tasks by dividing them into smaller, more achievable steps.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 75, 'prompt_tokens': 68, 'total_tokens': 143, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-01d17f40-c853-4e16-96bd-1e231e2486b5-0', usage_metadata={'input_tokens': 68, 'output_tokens': 75, 'total_tokens': 143})]}}\n",
      "----\n"
     ]
    }
@@ -748,7 +788,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "{'agent': {'messages': [AIMessage(content='Hello Bob! How can I assist you today?', response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 67, 'total_tokens': 78}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-1cd17562-18aa-4839-b41b-403b17a0fc20-0')]}}\n",
+      "{'agent': {'messages': [AIMessage(content='Hello Bob! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 67, 'total_tokens': 78, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-e41bbdf4-da73-43e3-b980-f0d258c4713d-0', usage_metadata={'input_tokens': 67, 'output_tokens': 11, 'total_tokens': 78})]}}\n",
      "----\n"
     ]
    }
@@ -777,22 +817,15 @@
   "id": "e2c570ae-dd91-402c-8693-ae746de63b16",
   "metadata": {},
   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Error in LangChainTracer.on_tool_end callback: TracerException(\"Found chain run at ID c54381c0-c5d9-495a-91a0-aca4ae755663, but expected {'tool'} run.\")\n"
-     ]
-    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_rg7zKTE5e0ICxVSslJ1u9LMg', 'function': {'arguments': '{\"query\":\"Task Decomposition\"}', 'name': 'blog_post_retriever'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 91, 'total_tokens': 110}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-122bf097-7ff1-49aa-b430-e362b51354ad-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'Task Decomposition'}, 'id': 'call_rg7zKTE5e0ICxVSslJ1u9LMg'}])]}}\n",
+      "{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_ygtIVKtuMQEsY95j31BvhzzN', 'function': {'arguments': '{\"query\":\"Task Decomposition\"}', 'name': 'blog_post_retriever'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 91, 'total_tokens': 110, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-61b7e948-e450-4902-b21c-66db5df816fc-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'Task Decomposition'}, 'id': 'call_ygtIVKtuMQEsY95j31BvhzzN', 'type': 'tool_call'}], usage_metadata={'input_tokens': 91, 'output_tokens': 19, 'total_tokens': 110})]}}\n",
      "----\n",
-      "{'tools': {'messages': [ToolMessage(content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\n\\nFig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\n\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\n\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', name='blog_post_retriever', tool_call_id='call_rg7zKTE5e0ICxVSslJ1u9LMg')]}}\n",
+      "{'tools': {'messages': [ToolMessage(content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\n\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\n\\n(3) Task execution: Expert models execute on the specific tasks and log results.\\nInstruction:\\n\\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user\\'s request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\\n\\nFig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\\nThe system comprises of 4 stages:\\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\\nInstruction:', name='blog_post_retriever', tool_call_id='call_ygtIVKtuMQEsY95j31BvhzzN')]}}\n",
      "----\n",
-      "{'agent': {'messages': [AIMessage(content='Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. This approach helps in managing and solving intricate problems by dividing them into more manageable components. By decomposing tasks, agents or models can better understand the steps involved and plan their actions accordingly. Techniques like Chain of Thought (CoT) and Tree of Thoughts are examples of methods that enhance model performance on complex tasks by breaking them down into smaller steps.', response_metadata={'token_usage': {'completion_tokens': 87, 'prompt_tokens': 659, 'total_tokens': 746}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-b9166386-83e5-4b82-9a4b-590e5fa76671-0')]}}\n",
+      "{'agent': {'messages': [AIMessage(content='Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. This approach helps autonomous agents or models to handle challenging tasks by dividing them into more manageable subtasks. One common method for task decomposition is the Chain of Thought (CoT) technique, where models are prompted to think step by step to decompose difficult tasks.\\n\\nAnother extension of CoT is the Tree of Thoughts, which explores multiple reasoning possibilities at each step by creating a tree structure of multiple thoughts per step. Task decomposition can be facilitated by providing simple prompts to language models, using task-specific instructions, or incorporating human inputs.\\n\\nOverall, task decomposition plays a crucial role in enabling autonomous agents to plan and execute complex tasks effectively by breaking them down into smaller, more manageable components.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 153, 'prompt_tokens': 611, 'total_tokens': 764, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-68aed524-fdf4-4d34-8546-dfb02f2a03cd-0', usage_metadata={'input_tokens': 611, 'output_tokens': 153, 'total_tokens': 764})]}}\n",
      "----\n"
     ]
    }
@@ -827,24 +860,11 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_6kbxTU5CDWLmF9mrvR7bWSkI', 'function': {'arguments': '{\"query\":\"Common ways of task decomposition\"}', 'name': 'blog_post_retriever'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 769, 'total_tokens': 790}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-2d2c8327-35cd-484a-b8fd-52436657c2d8-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'Common ways of task decomposition'}, 'id': 'call_6kbxTU5CDWLmF9mrvR7bWSkI'}])]}}\n",
-      "----\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Error in LangChainTracer.on_tool_end callback: TracerException(\"Found chain run at ID 29553415-e0f4-41a9-8921-ba489e377f68, but expected {'tool'} run.\")\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "{'tools': {'messages': [ToolMessage(content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\n\\nFig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\n\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\n\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', name='blog_post_retriever', tool_call_id='call_6kbxTU5CDWLmF9mrvR7bWSkI')]}}\n",
+      "{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_QOoWDqK4Bopi8P9HzGmnHAd5', 'function': {'arguments': '{\"query\":\"common ways of task decomposition\"}', 'name': 'blog_post_retriever'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 787, 'total_tokens': 808, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-096ddff3-9505-4b2f-ae87-c5af6924dd00-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'common ways of task decomposition'}, 'id': 'call_QOoWDqK4Bopi8P9HzGmnHAd5', 'type': 'tool_call'}], usage_metadata={'input_tokens': 787, 'output_tokens': 21, 'total_tokens': 808})]}}\n",
      "----\n",
-      "{'agent': {'messages': [AIMessage(content='Common ways of task decomposition include:\\n1. Using LLM with simple prompting like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\"\\n2. Using task-specific instructions, for example, \"Write a story outline\" for writing a novel.\\n3. Involving human inputs in the task decomposition process.', response_metadata={'token_usage': {'completion_tokens': 67, 'prompt_tokens': 1339, 'total_tokens': 1406}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-9ad14cde-ca75-4238-a868-f865e0fc50dd-0')]}}\n",
+      "{'tools': {'messages': [ToolMessage(content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\\n\\nFig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\n\\nResources:\\n1. Internet access for searches and information gathering.\\n2. Long Term memory management.\\n3. GPT-3.5 powered Agents for delegation of simple tasks.\\n4. File output.\\n\\nPerformance Evaluation:\\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\\n2. Constructively self-criticize your big-picture behavior constantly.\\n3. Reflect on past decisions and strategies to refine your approach.\\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.\\n\\n(3) Task execution: Expert models execute on the specific tasks and log results.\\nInstruction:\\n\\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user\\'s request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.', name='blog_post_retriever', tool_call_id='call_QOoWDqK4Bopi8P9HzGmnHAd5')]}}\n",
+      "----\n",
+      "{'agent': {'messages': [AIMessage(content='Common ways of task decomposition include:\\n\\n1. Using Language Models (LLM) with simple prompting: Language models can be prompted with instructions like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\" to break down tasks into smaller steps.\\n\\n2. Task-specific instructions: Providing specific instructions tailored to the task at hand, such as \"Write a story outline\" for writing a novel, can help in decomposing tasks effectively.\\n\\n3. Human inputs: Involving human inputs in the task decomposition process can also be a common approach to breaking down complex tasks into manageable subtasks.\\n\\nThese methods of task decomposition play a crucial role in enabling autonomous agents to effectively plan and execute complex tasks by breaking them down into smaller, more manageable components.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 152, 'prompt_tokens': 1332, 'total_tokens': 1484, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-41868dd4-a1d9-4323-b7b0-ac52c228a2ac-0', usage_metadata={'input_tokens': 1332, 'output_tokens': 152, 'total_tokens': 1484})]}}\n",
      "----\n"
     ]
    }
@@ -879,18 +899,27 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 1,
   "id": "b1d2b4d4-e604-497d-873d-d345b808578e",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "USER_AGENT environment variable not set, consider setting it to identify your requests.\n"
+     ]
+    }
+   ],
   "source": [
    "import bs4\n",
    "from langchain.tools.retriever import create_retriever_tool\n",
-    "from langchain_chroma import Chroma\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
+    "from langchain_core.vectorstores import InMemoryVectorStore\n",
    "from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "from langgraph.checkpoint.memory import MemorySaver\n",
+    "from langgraph.prebuilt import create_react_agent\n",
    "\n",
    "memory = MemorySaver()\n",
    "llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
@@ -909,7 +938,8 @@
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
    "splits = text_splitter.split_documents(docs)\n",
-    "vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
+    "vectorstore = InMemoryVectorStore(embedding=OpenAIEmbeddings())\n",
+    "vectorstore.add_documents(documents=splits)\n",
    "retriever = vectorstore.as_retriever()\n",
    "\n",
    "\n",
@@ -961,7 +991,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.2"
+   "version": "3.11.4"
  }
 },
 "nbformat": 4,
--- a/docs/docs/how_to/trim_messages.ipynb
+++ b/docs/docs/how_to/trim_messages.ipynb
@@ -2,7 +2,7 @@
 "cells": [
  {
   "cell_type": "markdown",
-   "id": "b5ee5b75-6876-4d62-9ade-5a7a808ae5a2",
+   "id": "eaad9a82-0592-4315-9931-0621054bdd0e",
   "metadata": {},
   "source": [
    "# How to trim messages\n",
@@ -22,33 +22,77 @@
    "\n",
    "All models have finite context windows, meaning there's a limit to how many tokens they can take as input. If you have very long messages or a chain/agent that accumulates a long message is history, you'll need to manage the length of the messages you're passing in to the model.\n",
    "\n",
-    "The `trim_messages` util provides some basic strategies for trimming a list of messages to be of a certain token length.\n",
+    "[trim_messages](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html) can be used to reduce the size of a chat history to a specified token count or specified message count.\n",
    "\n",
-    "## Getting the last `max_tokens` tokens\n",
    "\n",
-    "To get the last `max_tokens` in the list of Messages we can set `strategy=\"last\"`. Notice that for our `token_counter` we can pass in a function (more on that below) or a language model (since language models have a message token counting method). It makes sense to pass in a model when you're trimming your messages to fit into the context window of that specific model:"
+    "If passing the trimmed chat history back into a chat model directly, the trimmed chat history should satisfy the following properties:\n",
+    "\n",
+    "1. The resulting chat history should be **valid**. Most chat models expect that chat\n",
+    "   history starts with either (1) a `HumanMessage` or (2) a [SystemMessage](/docs/concepts/#systemmessage) followed\n",
+    "   by a `HumanMessage`. In addition, generally a `ToolMessage` can only appear after an `AIMessage`\n",
+    "   that involved a tool call. This can be achieved by setting `start_on=\"human\"`.\n",
+    "2. It includes recent messages and drops old messages in the chat history.\n",
+    "   This can be achieved by setting `strategy=\"last\"`.\n",
+    "4. Usually, the new chat history should include the `SystemMessage` if it\n",
+    "   was present in the original chat history since the `SystemMessage` includes\n",
+    "   special instructions to the chat model. The `SystemMessage` is almost always\n",
+    "   the first message in the history if present. This can be achieved by setting\n",
+    "   `include_system=True`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4bffc37-78c0-46c3-ad0c-b44de0ed3e90",
+   "metadata": {},
+   "source": [
+    "## Trimming based on token count\n",
+    "\n",
+    "Here, we'll trim the chat history based on token count. The trimmed chat history will produce a **valid** chat history that includes the `SystemMessage`.\n",
+    "\n",
+    "To keep the most recent messages, we set `strategy=\"last\"`.  We'll also set `include_system=True` to include the `SystemMessage`, and `start_on=\"human\"` to make sure the resulting chat history is valid. \n",
+    "\n",
+    "This is a good default configuration when using `trim_messages` based on token count. Remember to adjust `token_counter` and `max_tokens` for your use case.\n",
+    "\n",
+    "Notice that for our `token_counter` we can pass in a function (more on that below) or a language model (since language models have a message token counting method). It makes sense to pass in a model when you're trimming your messages to fit into the context window of that specific model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
-   "id": "c974633b-3bd0-4844-8a8f-85e3e25f13fe",
+   "id": "c91edeb2-9978-4665-9fdb-fc96cdb51caa",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "pip install -qU langchain-openai"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "40ea972c-d424-4bc4-9f2e-82f01c3d7598",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[AIMessage(content=\"Hmmm let me think.\\n\\nWhy, he's probably chasing after the last cup of coffee in the office!\"),\n",
-       " HumanMessage(content='what do you call a speechless parrot')]"
+       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
+       " HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
      ]
     },
-     "execution_count": 1,
+     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "# pip install -U langchain-openai\n",
    "from langchain_core.messages import (\n",
    "    AIMessage,\n",
    "    HumanMessage,\n",
@@ -70,36 +114,66 @@
    "    HumanMessage(\"what do you call a speechless parrot\"),\n",
    "]\n",
    "\n",
+    "\n",
    "trim_messages(\n",
    "    messages,\n",
-    "    max_tokens=45,\n",
+    "    # Keep the last <= n_count tokens of the messages.\n",
    "    strategy=\"last\",\n",
+    "    # highlight-start\n",
+    "    # Remember to adjust based on your model\n",
+    "    # or else pass a custom token_encoder\n",
    "    token_counter=ChatOpenAI(model=\"gpt-4o\"),\n",
+    "    # highlight-end\n",
+    "    # Most chat models expect that chat history starts with either:\n",
+    "    # (1) a HumanMessage or\n",
+    "    # (2) a SystemMessage followed by a HumanMessage\n",
+    "    # highlight-start\n",
+    "    # Remember to adjust based on the desired conversation\n",
+    "    # length\n",
+    "    max_tokens=45,\n",
+    "    # highlight-end\n",
+    "    # Most chat models expect that chat history starts with either:\n",
+    "    # (1) a HumanMessage or\n",
+    "    # (2) a SystemMessage followed by a HumanMessage\n",
+    "    # start_on=\"human\" makes sure we produce a valid chat history\n",
+    "    start_on=\"human\",\n",
+    "    # Usually, we want to keep the SystemMessage\n",
+    "    # if it's present in the original history.\n",
+    "    # The SystemMessage has special instructions for the model.\n",
+    "    include_system=True,\n",
+    "    allow_partial=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
-   "id": "d3f46654-c4b2-4136-b995-91c3febe5bf9",
+   "id": "28fcfc94-0d4a-415c-9506-8ae7634253a2",
   "metadata": {},
   "source": [
-    "If we want to always keep the initial system message we can specify `include_system=True`:"
+    "## Trimming based on message count\n",
+    "\n",
+    "Alternatively, we can trim the chat history based on **message count**, by setting `token_counter=len`. In this case, each message will count as a single token, and `max_tokens` will control\n",
+    "the maximum number of messages.\n",
+    "\n",
+    "This is a good default configuration when using `trim_messages` based on message count. Remember to adjust `max_tokens` for your use case."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
-   "id": "589b0223-3a73-44ec-8315-2dba3ee6117d",
+   "execution_count": 3,
+   "id": "c8fdedae-0e6b-4901-a222-81fc95e265c2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
-       " HumanMessage(content='what do you call a speechless parrot')]"
+       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
+       " HumanMessage(content='and who is harrison chasing anyways', additional_kwargs={}, response_metadata={}),\n",
+       " AIMessage(content=\"Hmmm let me think.\\n\\nWhy, he's probably chasing after the last cup of coffee in the office!\", additional_kwargs={}, response_metadata={}),\n",
+       " HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
      ]
     },
-     "execution_count": 2,
+     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -107,36 +181,56 @@
   "source": [
    "trim_messages(\n",
    "    messages,\n",
-    "    max_tokens=45,\n",
+    "    # Keep the last <= n_count tokens of the messages.\n",
    "    strategy=\"last\",\n",
-    "    token_counter=ChatOpenAI(model=\"gpt-4o\"),\n",
+    "    # highlight-next-line\n",
+    "    token_counter=len,\n",
+    "    # When token_counter=len, each message\n",
+    "    # will be counted as a single token.\n",
+    "    # highlight-start\n",
+    "    # Remember to adjust for your use case\n",
+    "    max_tokens=5,\n",
+    "    # highlight-end\n",
+    "    # Most chat models expect that chat history starts with either:\n",
+    "    # (1) a HumanMessage or\n",
+    "    # (2) a SystemMessage followed by a HumanMessage\n",
+    "    # start_on=\"human\" makes sure we produce a valid chat history\n",
+    "    start_on=\"human\",\n",
+    "    # Usually, we want to keep the SystemMessage\n",
+    "    # if it's present in the original history.\n",
+    "    # The SystemMessage has special instructions for the model.\n",
    "    include_system=True,\n",
    ")"
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
-   "id": "8a8b542c-04d1-4515-8d82-b999ea4fac4f",
+   "id": "9367857f-7f9a-4d17-9f9c-6ffc5aae909c",
   "metadata": {},
   "source": [
+    "## Advanced Usage\n",
+    "\n",
+    "You can use `trim_message` as a building-block to create more complex processing logic.\n",
+    "\n",
    "If we want to allow splitting up the contents of a message we can specify `allow_partial=True`:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
-   "id": "8c46a209-dddd-4d01-81f6-f6ae55d3225c",
+   "execution_count": 4,
+   "id": "8bcca1fe-674c-4713-bacc-8e8e6d6f56c3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
-       " AIMessage(content=\"\\nWhy, he's probably chasing after the last cup of coffee in the office!\"),\n",
-       " HumanMessage(content='what do you call a speechless parrot')]"
+       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
+       " AIMessage(content=\"\\nWhy, he's probably chasing after the last cup of coffee in the office!\", additional_kwargs={}, response_metadata={}),\n",
+       " HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
      ]
     },
-     "execution_count": 3,
+     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -154,26 +248,26 @@
  },
  {
   "cell_type": "markdown",
-   "id": "306adf9c-41cd-495c-b4dc-e4f43dd7f8f8",
+   "id": "245bee9b-e515-4e89-8f2a-84bda9a25de8",
   "metadata": {},
   "source": [
-    "If we need to make sure that our first message (excluding the system message) is always of a specific type, we can specify `start_on`:"
+    "By default, the `SystemMessage` will not be included, so you can drop it by either setting `include_system=False` or by dropping the `include_system` argument."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
-   "id": "878a730b-fe44-4e9d-ab65-7b8f7b069de8",
+   "execution_count": 5,
+   "id": "94351736-28a1-44a3-aac7-82356c81d171",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
-       " HumanMessage(content='what do you call a speechless parrot')]"
+       "[AIMessage(content=\"Hmmm let me think.\\n\\nWhy, he's probably chasing after the last cup of coffee in the office!\", additional_kwargs={}, response_metadata={}),\n",
+       " HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
      ]
     },
-     "execution_count": 4,
+     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -181,11 +275,9 @@
   "source": [
    "trim_messages(\n",
    "    messages,\n",
-    "    max_tokens=60,\n",
+    "    max_tokens=45,\n",
    "    strategy=\"last\",\n",
    "    token_counter=ChatOpenAI(model=\"gpt-4o\"),\n",
-    "    include_system=True,\n",
-    "    start_on=\"human\",\n",
    ")"
   ]
  },
@@ -194,25 +286,23 @@
   "id": "7f5d391d-235b-4091-b2de-c22866b478f3",
   "metadata": {},
   "source": [
-    "## Getting the first `max_tokens` tokens\n",
-    "\n",
    "We can perform the flipped operation of getting the *first* `max_tokens` by specifying `strategy=\"first\"`:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 6,
   "id": "5f56ae54-1a39-4019-9351-3b494c003d5b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
-       " HumanMessage(content=\"i wonder why it's called langchain\")]"
+       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
+       " HumanMessage(content=\"i wonder why it's called langchain\", additional_kwargs={}, response_metadata={})]"
      ]
     },
-     "execution_count": 5,
+     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -238,18 +328,36 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 60,
+   "id": "d930c089-e8e6-4980-9d39-11d41e794772",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "pip install -qU tiktoken"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
   "id": "1c1c3b1e-2ece-49e7-a3b6-e69877c1633b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[AIMessage(content=\"Hmmm let me think.\\n\\nWhy, he's probably chasing after the last cup of coffee in the office!\"),\n",
-       " HumanMessage(content='what do you call a speechless parrot')]"
+       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
+       " HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
      ]
     },
-     "execution_count": 6,
+     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -257,7 +365,6 @@
   "source": [
    "from typing import List\n",
    "\n",
-    "# pip install tiktoken\n",
    "import tiktoken\n",
    "from langchain_core.messages import BaseMessage, ToolMessage\n",
    "\n",
@@ -298,9 +405,25 @@
    "\n",
    "trim_messages(\n",
    "    messages,\n",
-    "    max_tokens=45,\n",
-    "    strategy=\"last\",\n",
+    "    # highlight-next-line\n",
    "    token_counter=tiktoken_counter,\n",
+    "    # Keep the last <= n_count tokens of the messages.\n",
+    "    strategy=\"last\",\n",
+    "    # When token_counter=len, each message\n",
+    "    # will be counted as a single token.\n",
+    "    # highlight-start\n",
+    "    # Remember to adjust for your use case\n",
+    "    max_tokens=45,\n",
+    "    # highlight-end\n",
+    "    # Most chat models expect that chat history starts with either:\n",
+    "    # (1) a HumanMessage or\n",
+    "    # (2) a SystemMessage followed by a HumanMessage\n",
+    "    # start_on=\"human\" makes sure we produce a valid chat history\n",
+    "    start_on=\"human\",\n",
+    "    # Usually, we want to keep the SystemMessage\n",
+    "    # if it's present in the original history.\n",
+    "    # The SystemMessage has special instructions for the model.\n",
+    "    include_system=True,\n",
    ")"
   ]
  },
@@ -311,22 +434,22 @@
   "source": [
    "## Chaining\n",
    "\n",
-    "`trim_messages` can be used in an imperatively (like above) or declaratively, making it easy to compose with other components in a chain"
+    "`trim_messages` can be used imperatively (like above) or declaratively, making it easy to compose with other components in a chain"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 62,
   "id": "96aa29b2-01e0-437c-a1ab-02fb0141cb57",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='A: A \"Polly-gone\"!', response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 32, 'total_tokens': 41}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_66b29dffce', 'finish_reason': 'stop', 'logprobs': None}, id='run-83e96ddf-bcaa-4f63-824c-98b0f8a0d474-0', usage_metadata={'input_tokens': 32, 'output_tokens': 9, 'total_tokens': 41})"
+       "AIMessage(content='A \"polygon!\"', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 32, 'total_tokens': 36, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3537616b13', 'finish_reason': 'stop', 'logprobs': None}, id='run-995342be-0443-4e33-9b54-153f5c8771d3-0', usage_metadata={'input_tokens': 32, 'output_tokens': 4, 'total_tokens': 36})"
      ]
     },
-     "execution_count": 7,
+     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -340,7 +463,15 @@
    "    max_tokens=45,\n",
    "    strategy=\"last\",\n",
    "    token_counter=llm,\n",
+    "    # Usually, we want to keep the SystemMessage\n",
+    "    # if it's present in the original history.\n",
+    "    # The SystemMessage has special instructions for the model.\n",
    "    include_system=True,\n",
+    "    # Most chat models expect that chat history starts with either:\n",
+    "    # (1) a HumanMessage or\n",
+    "    # (2) a SystemMessage followed by a HumanMessage\n",
+    "    # start_on=\"human\" makes sure we produce a valid chat history\n",
+    "    start_on=\"human\",\n",
    ")\n",
    "\n",
    "chain = trimmer | llm\n",
@@ -359,18 +490,18 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 63,
   "id": "1ff02d0a-353d-4fac-a77c-7c2c5262abd9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
-       " HumanMessage(content='what do you call a speechless parrot')]"
+       "[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
+       " HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
      ]
     },
-     "execution_count": 8,
+     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -391,17 +522,17 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 6,
   "id": "a9517858-fc2f-4dc3-898d-bf98a0e905a0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='A \"polly-no-wanna-cracker\"!', response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 32, 'total_tokens': 42}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_5bf7397cd3', 'finish_reason': 'stop', 'logprobs': None}, id='run-054dd309-3497-4e7b-b22a-c1859f11d32e-0', usage_metadata={'input_tokens': 32, 'output_tokens': 10, 'total_tokens': 42})"
+       "AIMessage(content='A polygon! (Because it\\'s a \"poly-gone\" quiet!)', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 32, 'total_tokens': 46, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_e375328146', 'finish_reason': 'stop', 'logprobs': None}, id='run-8569a119-ca02-4232-bee1-20caea61cd6d-0', usage_metadata={'input_tokens': 32, 'output_tokens': 14, 'total_tokens': 46})"
      ]
     },
-     "execution_count": 9,
+     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -425,7 +556,15 @@
    "    max_tokens=45,\n",
    "    strategy=\"last\",\n",
    "    token_counter=llm,\n",
+    "    # Usually, we want to keep the SystemMessage\n",
+    "    # if it's present in the original history.\n",
+    "    # The SystemMessage has special instructions for the model.\n",
    "    include_system=True,\n",
+    "    # Most chat models expect that chat history starts with either:\n",
+    "    # (1) a HumanMessage or\n",
+    "    # (2) a SystemMessage followed by a HumanMessage\n",
+    "    # start_on=\"human\" makes sure we produce a valid chat history\n",
+    "    start_on=\"human\",\n",
    ")\n",
    "\n",
    "chain = trimmer | llm\n",
@@ -471,7 +610,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.4"
+   "version": "3.11.4"
  }
 },
 "nbformat": 4,
--- a/docs/docs/tutorials/chatbot.ipynb
+++ b/docs/docs/tutorials/chatbot.ipynb
--- a/docs/docs/versions/migrating_chains/conversation_chain.ipynb
+++ b/docs/docs/versions/migrating_chains/conversation_chain.ipynb
@@ -9,13 +9,13 @@
    "\n",
    "[`ConversationChain`](https://python.langchain.com/api_reference/langchain/chains/langchain.chains.conversation.base.ConversationChain.html) incorporated a memory of previous messages to sustain a stateful conversation.\n",
    "\n",
-    "Some advantages of switching to the LCEL implementation are:\n",
+    "Some advantages of switching to the Langgraph implementation are:\n",
    "\n",
    "- Innate support for threads/separate sessions. To make this work with `ConversationChain`, you'd need to instantiate a separate memory class outside the chain.\n",
    "- More explicit parameters. `ConversationChain` contains a hidden default prompt, which can cause confusion.\n",
    "- Streaming support. `ConversationChain` only supports streaming via callbacks.\n",
    "\n",
-    "`RunnableWithMessageHistory` implements sessions via configuration parameters. It should be instantiated with a callable that returns a [chat message history](https://python.langchain.com/api_reference/core/chat_history/langchain_core.chat_history.BaseChatMessageHistory.html). By default, it expects this function to take a single argument `session_id`."
+    "Langgraph's [checkpointing](https://langchain-ai.github.io/langgraph/how-tos/persistence/) system supports multiple threads or sessions, which can be specified via the `\"thread_id\"` key in its configuration parameters."
   ]
  },
  {
@@ -61,9 +61,9 @@
    {
     "data": {
      "text/plain": [
-       "{'input': 'how are you?',\n",
+       "{'input': \"I'm Bob, how are you?\",\n",
       " 'history': '',\n",
-       " 'response': \"Arr matey, I be doin' well on the high seas, plunderin' and pillagin' as usual. How be ye?\"}"
+       " 'response': \"Arrr matey, I be a pirate sailin' the high seas. What be yer business with me?\"}"
      ]
     },
     "execution_count": 2,
@@ -93,7 +93,30 @@
    "    prompt=prompt,\n",
    ")\n",
    "\n",
-    "chain({\"input\": \"how are you?\"})"
+    "chain({\"input\": \"I'm Bob, how are you?\"})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "53f2c723-178f-470a-8147-54e7cb982211",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'input': 'What is my name?',\n",
+       " 'history': \"Human: I'm Bob, how are you?\\nAI: Arrr matey, I be a pirate sailin' the high seas. What be yer business with me?\",\n",
+       " 'response': 'Your name be Bob, matey.'}"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chain({\"input\": \"What is my name?\"})"
   ]
  },
  {
@@ -103,111 +126,110 @@
   "source": [
    "</details>\n",
    "\n",
-    "## LCEL\n",
+    "## Langgraph\n",
    "\n",
    "<details open>"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
-   "id": "666c92a0-b555-4418-a465-6490c1b92570",
+   "execution_count": 4,
+   "id": "a59b910c-0d02-41aa-bc99-441f11989cf8",
   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "\"Arr, me matey! I be doin' well, sailin' the high seas and searchin' for treasure. How be ye?\""
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
   "source": [
-    "from langchain_core.chat_history import InMemoryChatMessageHistory\n",
-    "from langchain_core.output_parsers import StrOutputParser\n",
-    "from langchain_core.prompts import ChatPromptTemplate\n",
-    "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
+    "import uuid\n",
+    "\n",
    "from langchain_openai import ChatOpenAI\n",
+    "from langgraph.checkpoint.memory import MemorySaver\n",
+    "from langgraph.graph import START, MessagesState, StateGraph\n",
    "\n",
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        (\"system\", \"You are a pirate. Answer the following questions as best you can.\"),\n",
-    "        (\"placeholder\", \"{chat_history}\"),\n",
-    "        (\"human\", \"{input}\"),\n",
-    "    ]\n",
-    ")\n",
+    "model = ChatOpenAI(model=\"gpt-4o-mini\")\n",
    "\n",
-    "history = InMemoryChatMessageHistory()\n",
+    "# Define a new graph\n",
+    "workflow = StateGraph(state_schema=MessagesState)\n",
    "\n",
    "\n",
-    "def get_history():\n",
-    "    return history\n",
+    "# Define the function that calls the model\n",
+    "def call_model(state: MessagesState):\n",
+    "    response = model.invoke(state[\"messages\"])\n",
+    "    return {\"messages\": response}\n",
    "\n",
    "\n",
-    "chain = prompt | ChatOpenAI() | StrOutputParser()\n",
+    "# Define the two nodes we will cycle between\n",
+    "workflow.add_edge(START, \"model\")\n",
+    "workflow.add_node(\"model\", call_model)\n",
    "\n",
-    "wrapped_chain = RunnableWithMessageHistory(\n",
-    "    chain,\n",
-    "    get_history,\n",
-    "    history_messages_key=\"chat_history\",\n",
-    ")\n",
+    "# Add memory\n",
+    "memory = MemorySaver()\n",
+    "app = workflow.compile(checkpointer=memory)\n",
    "\n",
-    "wrapped_chain.invoke({\"input\": \"how are you?\"})"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6b386ce6-895e-442c-88f3-7bec0ab9f401",
-   "metadata": {},
-   "source": [
-    "The above example uses the same `history` for all sessions. The example below shows how to use a different chat history for each session."
+    "\n",
+    "# The thread id is a unique key that identifies\n",
+    "# this particular conversation.\n",
+    "# We'll just generate a random uuid here.\n",
+    "thread_id = uuid.uuid4()\n",
+    "config = {\"configurable\": {\"thread_id\": thread_id}}"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
-   "id": "96152263-98d7-4e06-8c73-d0c0abf3e8e9",
+   "execution_count": 5,
+   "id": "3a9df4bb-e804-4373-9a15-a29dc0371595",
   "metadata": {},
   "outputs": [
    {
-     "data": {
-      "text/plain": [
-       "'Ahoy there, me hearty! What can this old pirate do for ye today?'"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "================================\u001b[1m Human Message \u001b[0m=================================\n",
+      "\n",
+      "I'm Bob, how are you?\n",
+      "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
+      "\n",
+      "Ahoy, Bob! I be feelin' as lively as a ship in full sail! How be ye on this fine day?\n"
+     ]
    }
   ],
   "source": [
-    "from langchain_core.chat_history import BaseChatMessageHistory\n",
-    "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
+    "query = \"I'm Bob, how are you?\"\n",
    "\n",
-    "store = {}\n",
+    "input_messages = [\n",
+    "    {\n",
+    "        \"role\": \"system\",\n",
+    "        \"content\": \"You are a pirate. Answer the following questions as best you can.\",\n",
+    "    },\n",
+    "    {\"role\": \"user\", \"content\": query},\n",
+    "]\n",
+    "for event in app.stream({\"messages\": input_messages}, config, stream_mode=\"values\"):\n",
+    "    event[\"messages\"][-1].pretty_print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "d3f77e69-fa3d-496c-968c-86371e1e8cf1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "================================\u001b[1m Human Message \u001b[0m=================================\n",
+      "\n",
+      "What is my name?\n",
+      "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
+      "\n",
+      "Ye be callin' yerself Bob, I reckon! A fine name for a swashbuckler like yerself!\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = \"What is my name?\"\n",
    "\n",
-    "\n",
-    "def get_session_history(session_id: str) -> BaseChatMessageHistory:\n",
-    "    if session_id not in store:\n",
-    "        store[session_id] = InMemoryChatMessageHistory()\n",
-    "    return store[session_id]\n",
-    "\n",
-    "\n",
-    "chain = prompt | ChatOpenAI() | StrOutputParser()\n",
-    "\n",
-    "wrapped_chain = RunnableWithMessageHistory(\n",
-    "    chain,\n",
-    "    get_session_history,\n",
-    "    history_messages_key=\"chat_history\",\n",
-    ")\n",
-    "\n",
-    "wrapped_chain.invoke(\n",
-    "    {\"input\": \"Hello!\"},\n",
-    "    config={\"configurable\": {\"session_id\": \"abc123\"}},\n",
-    ")"
+    "input_messages = [{\"role\": \"user\", \"content\": query}]\n",
+    "for event in app.stream({\"messages\": input_messages}, config, stream_mode=\"values\"):\n",
+    "    event[\"messages\"][-1].pretty_print()"
   ]
  },
  {
Author	SHA1	Message	Date
Eugene Yurtsev	14347acbce	x	2024-09-25 13:37:13 -04:00
Eugene Yurtsev	3ece5497ac	qxqx	2024-09-25 13:01:26 -04:00
Eugene Yurtsev	1b053e961f	x	2024-09-24 16:39:25 -04:00
Eugene Yurtsev	54d5b74b00	docs: update trim messages notebook (#26793 ) Update trim messages notebook to include common use cases and explain what the desired behavior is	2024-09-24 14:09:56 -04:00
Eugene Yurtsev	15d49d3df2	docs: update chat history in rag how-to (#26821 ) Update how-to add chat history to rag	2024-09-24 13:50:11 -04:00
Vadym Barda	2b38a4ee55	docs[patch]: update chatbot tools how-to (#26816 )	2024-09-24 11:39:04 -04:00
Vadym Barda	e8ce5cde99	docs[patch]: update chatbot memory how-to (#26790 )	2024-09-24 10:53:39 -04:00
ccurme	a7aad27cba	docs[patch]: update chatbot tutorial and migration guide (#26780 )	2024-09-24 10:18:48 -04:00