docs: Upgrade examples with RunnableWithMessageHistory to langgraph memory (#26855)

This PR updates the documentation examples that used RunnableWithMessageHistory to show how to achieve the same implementation with langgraph memory. Some of the underlying PRs (not all of them): - docs[patch]: update chatbot tutorial and migration guide (#26780) - docs[patch]: update chatbot memory how-to (#26790) - docs[patch]: update chatbot tools how-to (#26816) - docs: update chat history in rag how-to (#26821) - docs: update trim messages notebook (#26793) - docs: clean up imports in how to guide for rag qa with chat history (#26825) - docs[patch]: update conversational rag tutorial (#26814) --------- Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Vadym Barda <vadym@langchain.dev> Co-authored-by: mercyspirit <ziying.qiu@gmail.com> Co-authored-by: aqiu7 <aqiu7@gatech.edu> Co-authored-by: John <43506685+Coniferish@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Subhrajyoty Roy <subhrajyotyroy@gmail.com> Co-authored-by: Rajendra Kadam <raj.725@outlook.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Devin Gaffney <itsme@devingaffney.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2025-09-12 12:59:07 +00:00 · 2024-09-27 16:04:30 -04:00
parent 44eddd39d6
commit de0b48c41a
42 changed files with 2326 additions and 2418 deletions
--- a/libs/core/langchain_core/messages/utils.py
+++ b/libs/core/langchain_core/messages/utils.py
@@ -581,12 +581,38 @@ def trim_messages(
 ) -> list[BaseMessage]:
    """Trim messages to be below a token count.

+    trim_messages can be used to reduce the size of a chat history to a specified token
+    count or specified message count.
+
+    In either case, if passing the trimmed chat history back into a chat model
+    directly, the resulting chat history should usually satisfy the following
+    properties:
+
+    1. The resulting chat history should be valid. Most chat models expect that chat
+       history starts with either (1) a `HumanMessage` or (2) a `SystemMessage` followed
+       by a `HumanMessage`. To achieve this, set `start_on="human"`.
+       In addition, generally a `ToolMessage` can only appear after an `AIMessage`
+       that involved a tool call.
+       Please see the following link for more information about messages:
+       https://python.langchain.com/docs/concepts/#messages
+    2. It includes recent messages and drops old messages in the chat history.
+       To achieve this set the `strategy="last"`.
+    3. Usually, the new chat history should include the `SystemMessage` if it
+       was present in the original chat history since the `SystemMessage` includes
+       special instructions to the chat model. The `SystemMessage` is almost always
+       the first message in the history if present. To achieve this set the
+       `include_system=True`.
+
+    **Note** The examples below show how to configure `trim_messages` to achieve
+        a behavior consistent with the above properties.
+
    Args:
        messages: Sequence of Message-like objects to trim.
        max_tokens: Max token count of trimmed messages.
        token_counter: Function or llm for counting tokens in a BaseMessage or a list of
            BaseMessage. If a BaseLanguageModel is passed in then
            BaseLanguageModel.get_num_tokens_from_messages() will be used.
+            Set to `len` to count the number of **messages** in the chat history.
        strategy: Strategy for trimming.
            - "first": Keep the first <= n_count tokens of the messages.
            - "last": Keep the last <= n_count tokens of the messages.
@@ -633,11 +659,97 @@ def trim_messages(
            ``strategy`` is specified.

    Example:
+        Trim chat history based on token count, keeping the SystemMessage if
+        present, and ensuring that the chat history starts with a HumanMessage (
+        or a SystemMessage followed by a HumanMessage).
+
        .. code-block:: python

            from typing import List

-            from langchain_core.messages import trim_messages, AIMessage, BaseMessage, HumanMessage, SystemMessage
+            from langchain_core.messages import (
+                AIMessage,
+                HumanMessage,
+                BaseMessage,
+                SystemMessage,
+                trim_messages,
+            )
+
+            messages = [
+                SystemMessage("you're a good assistant, you always respond with a joke."),
+                HumanMessage("i wonder why it's called langchain"),
+                AIMessage(
+                    'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
+                ),
+                HumanMessage("and who is harrison chasing anyways"),
+                AIMessage(
+                    "Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
+                ),
+                HumanMessage("what do you call a speechless parrot"),
+            ]
+
+
+            trim_messages(
+                messages,
+                max_tokens=45,
+                strategy="last",
+                token_counter=ChatOpenAI(model="gpt-4o"),
+                # Most chat models expect that chat history starts with either:
+                # (1) a HumanMessage or
+                # (2) a SystemMessage followed by a HumanMessage
+                start_on="human",
+                # Usually, we want to keep the SystemMessage
+                # if it's present in the original history.
+                # The SystemMessage has special instructions for the model.
+                include_system=True,
+                allow_partial=False,
+            )
+
+        .. code-block:: python
+
+            [
+                SystemMessage(content="you're a good assistant, you always respond with a joke."),
+                HumanMessage(content='what do you call a speechless parrot'),
+            ]
+
+        Trim chat history based on the message count, keeping the SystemMessage if
+        present, and ensuring that the chat history starts with a HumanMessage (
+        or a SystemMessage followed by a HumanMessage).
+
+            trim_messages(
+                messages,
+                # When `len` is passed in as the token counter function,
+                # max_tokens will count the number of messages in the chat history.
+                max_tokens=4,
+                strategy="last",
+                # Passing in `len` as a token counter function will
+                # count the number of messages in the chat history.
+                token_counter=len,
+                # Most chat models expect that chat history starts with either:
+                # (1) a HumanMessage or
+                # (2) a SystemMessage followed by a HumanMessage
+                start_on="human",
+                # Usually, we want to keep the SystemMessage
+                # if it's present in the original history.
+                # The SystemMessage has special instructions for the model.
+                include_system=True,
+                allow_partial=False,
+            )
+
+        .. code-block:: python
+
+            [
+                SystemMessage(content="you're a good assistant, you always respond with a joke."),
+                HumanMessage(content='and who is harrison chasing anyways'),
+                AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"),
+                HumanMessage(content='what do you call a speechless parrot'),
+            ]
+
+
+        Trim chat history using a custom token counter function that counts the
+        number of tokens in each message.
+
+        .. code-block:: python

            messages = [
                SystemMessage("This is a 4 token text. The full message is 10 tokens."),
@@ -670,18 +782,6 @@ def trim_messages(
                        count += default_msg_prefix_len + len(msg.content) *  default_content_len + default_msg_suffix_len
                return count

-        First 30 tokens, not allowing partial messages:
-            .. code-block:: python
-
-                trim_messages(messages, max_tokens=30, token_counter=dummy_token_counter, strategy="first")
-
-            .. code-block:: python
-
-                [
-                    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
-                    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="first"),
-                ]
-
        First 30 tokens, allowing partial messages:
            .. code-block:: python

@@ -700,108 +800,6 @@ def trim_messages(
                    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="first"),
                    AIMessage( [{"type": "text", "text": "This is the FIRST 4 token block."}], id="second"),
                ]
-
-        First 30 tokens, allowing partial messages, have to end on HumanMessage:
-            .. code-block:: python
-
-                trim_messages(
-                    messages,
-                    max_tokens=30,
-                    token_counter=dummy_token_counter,
-                    strategy="first"
-                    allow_partial=True,
-                    end_on="human",
-                )
-
-            .. code-block:: python
-
-                [
-                    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
-                    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="first"),
-                ]
-
-
-        Last 30 tokens, including system message, not allowing partial messages:
-            .. code-block:: python
-
-                trim_messages(messages, max_tokens=30, include_system=True, token_counter=dummy_token_counter, strategy="last")
-
-            .. code-block:: python
-
-                [
-                    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
-                    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="third"),
-                    AIMessage("This is a 4 token text. The full message is 10 tokens.", id="fourth"),
-                ]
-
-        Last 40 tokens, including system message, allowing partial messages:
-            .. code-block:: python
-
-                trim_messages(
-                    messages,
-                    max_tokens=40,
-                    token_counter=dummy_token_counter,
-                    strategy="last",
-                    allow_partial=True,
-                    include_system=True
-                )
-
-            .. code-block:: python
-
-                [
-                    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
-                    AIMessage(
-                        [{"type": "text", "text": "This is the FIRST 4 token block."},],
-                        id="second",
-                    ),
-                    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="third"),
-                    AIMessage("This is a 4 token text. The full message is 10 tokens.", id="fourth"),
-                ]
-
-        Last 30 tokens, including system message, allowing partial messages, end on HumanMessage:
-            .. code-block:: python
-
-                trim_messages(
-                    messages,
-                    max_tokens=30,
-                    token_counter=dummy_token_counter,
-                    strategy="last",
-                    end_on="human",
-                    include_system=True,
-                    allow_partial=True,
-                )
-
-            .. code-block:: python
-
-                [
-                    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
-                    AIMessage(
-                        [{"type": "text", "text": "This is the FIRST 4 token block."},],
-                        id="second",
-                    ),
-                    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="third"),
-                ]
-
-        Last 40 tokens, including system message, allowing partial messages, start on HumanMessage:
-            .. code-block:: python
-
-                trim_messages(
-                    messages,
-                    max_tokens=40,
-                    token_counter=dummy_token_counter,
-                    strategy="last",
-                    include_system=True,
-                    allow_partial=True,
-                    start_on="human"
-                )
-
-            .. code-block:: python
-
-                [
-                    SystemMessage("This is a 4 token text. The full message is 10 tokens."),
-                    HumanMessage("This is a 4 token text. The full message is 10 tokens.", id="third"),
-                    AIMessage("This is a 4 token text. The full message is 10 tokens.", id="fourth"),
-                ]
    """  # noqa: E501

    if start_on and strategy == "first":