docs: update trim messages notebook (#26793)

Update trim messages notebook to include common use cases and explain
what the desired behavior is
This commit is contained in:
Eugene Yurtsev 2024-09-24 14:09:56 -04:00 committed by GitHub
parent 15d49d3df2
commit 54d5b74b00
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "b5ee5b75-6876-4d62-9ade-5a7a808ae5a2",
"id": "eaad9a82-0592-4315-9931-0621054bdd0e",
"metadata": {},
"source": [
"# How to trim messages\n",
@ -22,33 +22,77 @@
"\n",
"All models have finite context windows, meaning there's a limit to how many tokens they can take as input. If you have very long messages or a chain/agent that accumulates a long message is history, you'll need to manage the length of the messages you're passing in to the model.\n",
"\n",
"The `trim_messages` util provides some basic strategies for trimming a list of messages to be of a certain token length.\n",
"[trim_messages](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html) can be used to reduce the size of a chat history to a specified token count or specified message count.\n",
"\n",
"## Getting the last `max_tokens` tokens\n",
"\n",
"To get the last `max_tokens` in the list of Messages we can set `strategy=\"last\"`. Notice that for our `token_counter` we can pass in a function (more on that below) or a language model (since language models have a message token counting method). It makes sense to pass in a model when you're trimming your messages to fit into the context window of that specific model:"
"If passing the trimmed chat history back into a chat model directly, the trimmed chat history should satisfy the following properties:\n",
"\n",
"1. The resulting chat history should be **valid**. Most chat models expect that chat\n",
" history starts with either (1) a `HumanMessage` or (2) a [SystemMessage](/docs/concepts/#systemmessage) followed\n",
" by a `HumanMessage`. In addition, generally a `ToolMessage` can only appear after an `AIMessage`\n",
" that involved a tool call. This can be achieved by setting `start_on=\"human\"`.\n",
"2. It includes recent messages and drops old messages in the chat history.\n",
" This can be achieved by setting `strategy=\"last\"`.\n",
"4. Usually, the new chat history should include the `SystemMessage` if it\n",
" was present in the original chat history since the `SystemMessage` includes\n",
" special instructions to the chat model. The `SystemMessage` is almost always\n",
" the first message in the history if present. This can be achieved by setting\n",
" `include_system=True`."
]
},
{
"cell_type": "markdown",
"id": "e4bffc37-78c0-46c3-ad0c-b44de0ed3e90",
"metadata": {},
"source": [
"## Trimming based on token count\n",
"\n",
"Here, we'll trim the chat history based on token count. The trimmed chat history will produce a **valid** chat history that includes the `SystemMessage`.\n",
"\n",
"To keep the most recent messages, we set `strategy=\"last\"`. We'll also set `include_system=True` to include the `SystemMessage`, and `start_on=\"human\"` to make sure the resulting chat history is valid. \n",
"\n",
"This is a good default configuration when using `trim_messages` based on token count. Remember to adjust `token_counter` and `max_tokens` for your use case.\n",
"\n",
"Notice that for our `token_counter` we can pass in a function (more on that below) or a language model (since language models have a message token counting method). It makes sense to pass in a model when you're trimming your messages to fit into the context window of that specific model:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c974633b-3bd0-4844-8a8f-85e3e25f13fe",
"id": "c91edeb2-9978-4665-9fdb-fc96cdb51caa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install -qU langchain-openai"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "40ea972c-d424-4bc4-9f2e-82f01c3d7598",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[AIMessage(content=\"Hmmm let me think.\\n\\nWhy, he's probably chasing after the last cup of coffee in the office!\"),\n",
" HumanMessage(content='what do you call a speechless parrot')]"
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
" HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
]
},
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# pip install -U langchain-openai\n",
"from langchain_core.messages import (\n",
" AIMessage,\n",
" HumanMessage,\n",
@ -70,36 +114,66 @@
" HumanMessage(\"what do you call a speechless parrot\"),\n",
"]\n",
"\n",
"\n",
"trim_messages(\n",
" messages,\n",
" max_tokens=45,\n",
" # Keep the last <= n_count tokens of the messages.\n",
" strategy=\"last\",\n",
" # highlight-start\n",
" # Remember to adjust based on your model\n",
" # or else pass a custom token_encoder\n",
" token_counter=ChatOpenAI(model=\"gpt-4o\"),\n",
" # highlight-end\n",
" # Most chat models expect that chat history starts with either:\n",
" # (1) a HumanMessage or\n",
" # (2) a SystemMessage followed by a HumanMessage\n",
" # highlight-start\n",
" # Remember to adjust based on the desired conversation\n",
" # length\n",
" max_tokens=45,\n",
" # highlight-end\n",
" # Most chat models expect that chat history starts with either:\n",
" # (1) a HumanMessage or\n",
" # (2) a SystemMessage followed by a HumanMessage\n",
" # start_on=\"human\" makes sure we produce a valid chat history\n",
" start_on=\"human\",\n",
" # Usually, we want to keep the SystemMessage\n",
" # if it's present in the original history.\n",
" # The SystemMessage has special instructions for the model.\n",
" include_system=True,\n",
" allow_partial=False,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "d3f46654-c4b2-4136-b995-91c3febe5bf9",
"id": "28fcfc94-0d4a-415c-9506-8ae7634253a2",
"metadata": {},
"source": [
"If we want to always keep the initial system message we can specify `include_system=True`:"
"## Trimming based on message count\n",
"\n",
"Alternatively, we can trim the chat history based on **message count**, by setting `token_counter=len`. In this case, each message will count as a single token, and `max_tokens` will control\n",
"the maximum number of messages.\n",
"\n",
"This is a good default configuration when using `trim_messages` based on message count. Remember to adjust `max_tokens` for your use case."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "589b0223-3a73-44ec-8315-2dba3ee6117d",
"execution_count": 3,
"id": "c8fdedae-0e6b-4901-a222-81fc95e265c2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
" HumanMessage(content='what do you call a speechless parrot')]"
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
" HumanMessage(content='and who is harrison chasing anyways', additional_kwargs={}, response_metadata={}),\n",
" AIMessage(content=\"Hmmm let me think.\\n\\nWhy, he's probably chasing after the last cup of coffee in the office!\", additional_kwargs={}, response_metadata={}),\n",
" HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
]
},
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@ -107,36 +181,56 @@
"source": [
"trim_messages(\n",
" messages,\n",
" max_tokens=45,\n",
" # Keep the last <= n_count tokens of the messages.\n",
" strategy=\"last\",\n",
" token_counter=ChatOpenAI(model=\"gpt-4o\"),\n",
" # highlight-next-line\n",
" token_counter=len,\n",
" # When token_counter=len, each message\n",
" # will be counted as a single token.\n",
" # highlight-start\n",
" # Remember to adjust for your use case\n",
" max_tokens=5,\n",
" # highlight-end\n",
" # Most chat models expect that chat history starts with either:\n",
" # (1) a HumanMessage or\n",
" # (2) a SystemMessage followed by a HumanMessage\n",
" # start_on=\"human\" makes sure we produce a valid chat history\n",
" start_on=\"human\",\n",
" # Usually, we want to keep the SystemMessage\n",
" # if it's present in the original history.\n",
" # The SystemMessage has special instructions for the model.\n",
" include_system=True,\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8a8b542c-04d1-4515-8d82-b999ea4fac4f",
"id": "9367857f-7f9a-4d17-9f9c-6ffc5aae909c",
"metadata": {},
"source": [
"## Advanced Usage\n",
"\n",
"You can use `trim_message` as a building-block to create more complex processing logic.\n",
"\n",
"If we want to allow splitting up the contents of a message we can specify `allow_partial=True`:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8c46a209-dddd-4d01-81f6-f6ae55d3225c",
"execution_count": 4,
"id": "8bcca1fe-674c-4713-bacc-8e8e6d6f56c3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
" AIMessage(content=\"\\nWhy, he's probably chasing after the last cup of coffee in the office!\"),\n",
" HumanMessage(content='what do you call a speechless parrot')]"
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
" AIMessage(content=\"\\nWhy, he's probably chasing after the last cup of coffee in the office!\", additional_kwargs={}, response_metadata={}),\n",
" HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
]
},
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@ -154,26 +248,26 @@
},
{
"cell_type": "markdown",
"id": "306adf9c-41cd-495c-b4dc-e4f43dd7f8f8",
"id": "245bee9b-e515-4e89-8f2a-84bda9a25de8",
"metadata": {},
"source": [
"If we need to make sure that our first message (excluding the system message) is always of a specific type, we can specify `start_on`:"
"By default, the `SystemMessage` will not be included, so you can drop it by either setting `include_system=False` or by dropping the `include_system` argument."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "878a730b-fe44-4e9d-ab65-7b8f7b069de8",
"execution_count": 5,
"id": "94351736-28a1-44a3-aac7-82356c81d171",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
" HumanMessage(content='what do you call a speechless parrot')]"
"[AIMessage(content=\"Hmmm let me think.\\n\\nWhy, he's probably chasing after the last cup of coffee in the office!\", additional_kwargs={}, response_metadata={}),\n",
" HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
]
},
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@ -181,11 +275,9 @@
"source": [
"trim_messages(\n",
" messages,\n",
" max_tokens=60,\n",
" max_tokens=45,\n",
" strategy=\"last\",\n",
" token_counter=ChatOpenAI(model=\"gpt-4o\"),\n",
" include_system=True,\n",
" start_on=\"human\",\n",
")"
]
},
@ -194,25 +286,23 @@
"id": "7f5d391d-235b-4091-b2de-c22866b478f3",
"metadata": {},
"source": [
"## Getting the first `max_tokens` tokens\n",
"\n",
"We can perform the flipped operation of getting the *first* `max_tokens` by specifying `strategy=\"first\"`:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"id": "5f56ae54-1a39-4019-9351-3b494c003d5b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
" HumanMessage(content=\"i wonder why it's called langchain\")]"
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
" HumanMessage(content=\"i wonder why it's called langchain\", additional_kwargs={}, response_metadata={})]"
]
},
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@ -238,18 +328,36 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 60,
"id": "d930c089-e8e6-4980-9d39-11d41e794772",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install -qU tiktoken"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "1c1c3b1e-2ece-49e7-a3b6-e69877c1633b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[AIMessage(content=\"Hmmm let me think.\\n\\nWhy, he's probably chasing after the last cup of coffee in the office!\"),\n",
" HumanMessage(content='what do you call a speechless parrot')]"
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
" HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
]
},
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@ -257,7 +365,6 @@
"source": [
"from typing import List\n",
"\n",
"# pip install tiktoken\n",
"import tiktoken\n",
"from langchain_core.messages import BaseMessage, ToolMessage\n",
"\n",
@ -298,9 +405,25 @@
"\n",
"trim_messages(\n",
" messages,\n",
" max_tokens=45,\n",
" strategy=\"last\",\n",
" # highlight-next-line\n",
" token_counter=tiktoken_counter,\n",
" # Keep the last <= n_count tokens of the messages.\n",
" strategy=\"last\",\n",
" # When token_counter=len, each message\n",
" # will be counted as a single token.\n",
" # highlight-start\n",
" # Remember to adjust for your use case\n",
" max_tokens=45,\n",
" # highlight-end\n",
" # Most chat models expect that chat history starts with either:\n",
" # (1) a HumanMessage or\n",
" # (2) a SystemMessage followed by a HumanMessage\n",
" # start_on=\"human\" makes sure we produce a valid chat history\n",
" start_on=\"human\",\n",
" # Usually, we want to keep the SystemMessage\n",
" # if it's present in the original history.\n",
" # The SystemMessage has special instructions for the model.\n",
" include_system=True,\n",
")"
]
},
@ -311,22 +434,22 @@
"source": [
"## Chaining\n",
"\n",
"`trim_messages` can be used in an imperatively (like above) or declaratively, making it easy to compose with other components in a chain"
"`trim_messages` can be used imperatively (like above) or declaratively, making it easy to compose with other components in a chain"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 62,
"id": "96aa29b2-01e0-437c-a1ab-02fb0141cb57",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='A: A \"Polly-gone\"!', response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 32, 'total_tokens': 41}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_66b29dffce', 'finish_reason': 'stop', 'logprobs': None}, id='run-83e96ddf-bcaa-4f63-824c-98b0f8a0d474-0', usage_metadata={'input_tokens': 32, 'output_tokens': 9, 'total_tokens': 41})"
"AIMessage(content='A \"polygon!\"', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 32, 'total_tokens': 36, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3537616b13', 'finish_reason': 'stop', 'logprobs': None}, id='run-995342be-0443-4e33-9b54-153f5c8771d3-0', usage_metadata={'input_tokens': 32, 'output_tokens': 4, 'total_tokens': 36})"
]
},
"execution_count": 7,
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
@ -340,7 +463,15 @@
" max_tokens=45,\n",
" strategy=\"last\",\n",
" token_counter=llm,\n",
" # Usually, we want to keep the SystemMessage\n",
" # if it's present in the original history.\n",
" # The SystemMessage has special instructions for the model.\n",
" include_system=True,\n",
" # Most chat models expect that chat history starts with either:\n",
" # (1) a HumanMessage or\n",
" # (2) a SystemMessage followed by a HumanMessage\n",
" # start_on=\"human\" makes sure we produce a valid chat history\n",
" start_on=\"human\",\n",
")\n",
"\n",
"chain = trimmer | llm\n",
@ -359,18 +490,18 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 63,
"id": "1ff02d0a-353d-4fac-a77c-7c2c5262abd9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\"),\n",
" HumanMessage(content='what do you call a speechless parrot')]"
"[SystemMessage(content=\"you're a good assistant, you always respond with a joke.\", additional_kwargs={}, response_metadata={}),\n",
" HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]"
]
},
"execution_count": 8,
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
@ -391,17 +522,17 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 6,
"id": "a9517858-fc2f-4dc3-898d-bf98a0e905a0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='A \"polly-no-wanna-cracker\"!', response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 32, 'total_tokens': 42}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_5bf7397cd3', 'finish_reason': 'stop', 'logprobs': None}, id='run-054dd309-3497-4e7b-b22a-c1859f11d32e-0', usage_metadata={'input_tokens': 32, 'output_tokens': 10, 'total_tokens': 42})"
"AIMessage(content='A polygon! (Because it\\'s a \"poly-gone\" quiet!)', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 32, 'total_tokens': 46, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_e375328146', 'finish_reason': 'stop', 'logprobs': None}, id='run-8569a119-ca02-4232-bee1-20caea61cd6d-0', usage_metadata={'input_tokens': 32, 'output_tokens': 14, 'total_tokens': 46})"
]
},
"execution_count": 9,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@ -425,7 +556,15 @@
" max_tokens=45,\n",
" strategy=\"last\",\n",
" token_counter=llm,\n",
" # Usually, we want to keep the SystemMessage\n",
" # if it's present in the original history.\n",
" # The SystemMessage has special instructions for the model.\n",
" include_system=True,\n",
" # Most chat models expect that chat history starts with either:\n",
" # (1) a HumanMessage or\n",
" # (2) a SystemMessage followed by a HumanMessage\n",
" # start_on=\"human\" makes sure we produce a valid chat history\n",
" start_on=\"human\",\n",
")\n",
"\n",
"chain = trimmer | llm\n",
@ -471,7 +610,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
"version": "3.11.4"
}
},
"nbformat": 4,