From 3781144710e75c852aad9abd2731857e31997970 Mon Sep 17 00:00:00 2001 From: ccurme Date: Wed, 26 Mar 2025 16:13:45 -0400 Subject: [PATCH] docs: update doc on token usage tracking (#30505) --- .../how_to/chat_token_usage_tracking.ipynb | 362 ++++++------------ 1 file changed, 118 insertions(+), 244 deletions(-) diff --git a/docs/docs/how_to/chat_token_usage_tracking.ipynb b/docs/docs/how_to/chat_token_usage_tracking.ipynb index 95742c9371b..4ee3bb7b30b 100644 --- a/docs/docs/how_to/chat_token_usage_tracking.ipynb +++ b/docs/docs/how_to/chat_token_usage_tracking.ipynb @@ -16,7 +16,7 @@ "\n", "Tracking [token](/docs/concepts/tokens/) usage to calculate cost is an important part of putting your app in production. This guide goes over how to obtain this information from your LangChain model calls.\n", "\n", - "This guide requires `langchain-anthropic` and `langchain-openai >= 0.1.9`." + "This guide requires `langchain-anthropic` and `langchain-openai >= 0.3.11`." ] }, { @@ -38,19 +38,9 @@ "\n", "OpenAI's Chat Completions API does not stream token usage statistics by default (see API reference\n", "[here](https://platform.openai.com/docs/api-reference/completions/create#completions-create-stream_options)).\n", - "To recover token counts when streaming with `ChatOpenAI`, set `stream_usage=True` as\n", + "To recover token counts when streaming with `ChatOpenAI` or `AzureChatOpenAI`, set `stream_usage=True` as\n", "demonstrated in this guide.\n", "\n", - "For `AzureChatOpenAI`, set `stream_options={\"include_usage\": True}` when calling\n", - "`.(a)stream`, or initialize with:\n", - "\n", - "```python\n", - "AzureChatOpenAI(\n", - " ...,\n", - " model_kwargs={\"stream_options\": {\"include_usage\": True}},\n", - ")\n", - "```\n", - "\n", ":::" ] }, @@ -67,7 +57,7 @@ "\n", "A number of model providers return token usage information as part of the chat generation response. When available, this information will be included on the `AIMessage` objects produced by the corresponding model.\n", "\n", - "LangChain `AIMessage` objects include a [usage_metadata](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessage.html#langchain_core.messages.ai.AIMessage.usage_metadata) attribute. When populated, this attribute will be a [UsageMetadata](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.html) dictionary with standard keys (e.g., `\"input_tokens\"` and `\"output_tokens\"`).\n", + "LangChain `AIMessage` objects include a [usage_metadata](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessage.html#langchain_core.messages.ai.AIMessage.usage_metadata) attribute. When populated, this attribute will be a [UsageMetadata](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.html) dictionary with standard keys (e.g., `\"input_tokens\"` and `\"output_tokens\"`). They will also include information on cached token usage and tokens from multi-modal data.\n", "\n", "Examples:\n", "\n", @@ -92,9 +82,9 @@ } ], "source": [ - "from langchain_openai import ChatOpenAI\n", + "from langchain.chat_models import init_chat_model\n", "\n", - "llm = ChatOpenAI(model=\"gpt-4o-mini\")\n", + "llm = init_chat_model(model=\"gpt-4o-mini\")\n", "openai_response = llm.invoke(\"hello\")\n", "openai_response.usage_metadata" ] @@ -132,37 +122,6 @@ "anthropic_response.usage_metadata" ] }, - { - "cell_type": "markdown", - "id": "6d4efc15-ba9f-4b3d-9278-8e01f99f263f", - "metadata": {}, - "source": [ - "### Using AIMessage.response_metadata\n", - "\n", - "Metadata from the model response is also included in the AIMessage [response_metadata](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessage.html#langchain_core.messages.ai.AIMessage.response_metadata) attribute. These data are typically not standardized. Note that different providers adopt different conventions for representing token counts:" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "f156f9da-21f2-4c81-a714-54cbf9ad393e", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "OpenAI: {'completion_tokens': 9, 'prompt_tokens': 8, 'total_tokens': 17}\n", - "\n", - "Anthropic: {'input_tokens': 8, 'output_tokens': 12}\n" - ] - } - ], - "source": [ - "print(f'OpenAI: {openai_response.response_metadata[\"token_usage\"]}\\n')\n", - "print(f'Anthropic: {anthropic_response.response_metadata[\"usage\"]}')" - ] - }, { "cell_type": "markdown", "id": "b4ef2c43-0ff6-49eb-9782-e4070c9da8d7", @@ -207,7 +166,7 @@ } ], "source": [ - "llm = ChatOpenAI(model=\"gpt-4o-mini\")\n", + "llm = init_chat_model(model=\"gpt-4o-mini\")\n", "\n", "aggregate = None\n", "for chunk in llm.stream(\"hello\", stream_usage=True):\n", @@ -318,7 +277,7 @@ " punchline: str = Field(description=\"answer to resolve the joke\")\n", "\n", "\n", - "llm = ChatOpenAI(\n", + "llm = init_chat_model(\n", " model=\"gpt-4o-mini\",\n", " stream_usage=True,\n", ")\n", @@ -326,10 +285,10 @@ "# chat model and appends a parser.\n", "structured_llm = llm.with_structured_output(Joke)\n", "\n", - "async for event in structured_llm.astream_events(\"Tell me a joke\", version=\"v2\"):\n", + "async for event in structured_llm.astream_events(\"Tell me a joke\"):\n", " if event[\"event\"] == \"on_chat_model_end\":\n", " print(f'Token usage: {event[\"data\"][\"output\"].usage_metadata}\\n')\n", - " elif event[\"event\"] == \"on_chain_end\":\n", + " elif event[\"event\"] == \"on_chain_end\" and event[\"name\"] == \"RunnableSequence\":\n", " print(event[\"data\"][\"output\"])\n", " else:\n", " pass" @@ -350,17 +309,18 @@ "source": [ "## Using callbacks\n", "\n", - "There are also some API-specific callback context managers that allow you to track token usage across multiple calls. They are currently only implemented for the OpenAI API and Bedrock Anthropic API, and are available in `langchain-community`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "64e52d21", - "metadata": {}, - "outputs": [], - "source": [ - "%pip install -qU langchain-community" + ":::info Requires ``langchain-core>=0.3.49``\n", + "\n", + ":::\n", + "\n", + "LangChain implements a callback handler and context manager that will track token usage across calls of any chat model that returns `usage_metadata`.\n", + "\n", + "There are also some API-specific callback context managers that maintain pricing for different models, allowing for cost estimation in real time. They are currently only implemented for the OpenAI API and Bedrock Anthropic API, and are available in `langchain-community`:\n", + "\n", + "- [get_openai_callback](https://python.langchain.com/api_reference/community/callbacks/langchain_community.callbacks.manager.get_openai_callback.html)\n", + "- [get_bedrock_anthropic_callback](https://python.langchain.com/api_reference/community/callbacks/langchain_community.callbacks.manager.get_bedrock_anthropic_callback.html)\n", + "\n", + "Below, we demonstrate the general-purpose usage metadata callback manager. We can track token usage through configuration or as a context manager." ] }, { @@ -368,41 +328,84 @@ "id": "6f043cb9", "metadata": {}, "source": [ - "### OpenAI\n", + "### Tracking token usage through configuration\n", "\n", - "Let's first look at an extremely simple example of tracking token usage for a single Chat model call." + "To track token usage through configuration, instantiate a `UsageMetadataCallbackHandler` and pass it into the config:" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 17, "id": "b04a4486-72fd-48ce-8f9e-5d281b441195", "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'gpt-4o-mini-2024-07-18': {'input_tokens': 8,\n", + " 'output_tokens': 10,\n", + " 'total_tokens': 18,\n", + " 'input_token_details': {'audio': 0, 'cache_read': 0},\n", + " 'output_token_details': {'audio': 0, 'reasoning': 0}},\n", + " 'claude-3-5-haiku-20241022': {'input_tokens': 8,\n", + " 'output_tokens': 21,\n", + " 'total_tokens': 29,\n", + " 'input_token_details': {'cache_read': 0, 'cache_creation': 0}}}" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from langchain.chat_models import init_chat_model\n", + "from langchain_core.callbacks import UsageMetadataCallbackHandler\n", + "\n", + "llm_1 = init_chat_model(model=\"openai:gpt-4o-mini\")\n", + "llm_2 = init_chat_model(model=\"anthropic:claude-3-5-haiku-latest\")\n", + "\n", + "callback = UsageMetadataCallbackHandler()\n", + "result_1 = llm_1.invoke(\"Hello\", config={\"callbacks\": [callback]})\n", + "result_2 = llm_2.invoke(\"Hello\", config={\"callbacks\": [callback]})\n", + "callback.usage_metadata" + ] + }, + { + "cell_type": "markdown", + "id": "7a290085-e541-4233-afe4-637ec5032bfd", + "metadata": {}, + "source": [ + "### Tracking token usage using a context manager\n", + "\n", + "You can also use `get_usage_metadata_callback` to create a context manager and aggregate usage metadata there:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "4728f55a-24e1-48cd-a195-09d037821b1e", + "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Tokens Used: 27\n", - "\tPrompt Tokens: 11\n", - "\tCompletion Tokens: 16\n", - "Successful Requests: 1\n", - "Total Cost (USD): $2.95e-05\n" + "{'gpt-4o-mini-2024-07-18': {'input_tokens': 8, 'output_tokens': 10, 'total_tokens': 18, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}, 'claude-3-5-haiku-20241022': {'input_tokens': 8, 'output_tokens': 21, 'total_tokens': 29, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}}}\n" ] } ], "source": [ - "from langchain_community.callbacks.manager import get_openai_callback\n", + "from langchain.chat_models import init_chat_model\n", + "from langchain_core.callbacks import get_usage_metadata_callback\n", "\n", - "llm = ChatOpenAI(\n", - " model=\"gpt-4o-mini\",\n", - " temperature=0,\n", - " stream_usage=True,\n", - ")\n", + "llm_1 = init_chat_model(model=\"openai:gpt-4o-mini\")\n", + "llm_2 = init_chat_model(model=\"anthropic:claude-3-5-haiku-latest\")\n", "\n", - "with get_openai_callback() as cb:\n", - " result = llm.invoke(\"Tell me a joke\")\n", - " print(cb)" + "with get_usage_metadata_callback() as cb:\n", + " llm_1.invoke(\"Hello\")\n", + " llm_2.invoke(\"Hello\")\n", + " print(cb.usage_metadata)" ] }, { @@ -410,61 +413,7 @@ "id": "c0ab6d27", "metadata": {}, "source": [ - "Anything inside the context manager will get tracked. Here's an example of using it to track multiple calls in sequence." - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "05f22a1d-b021-490f-8840-f628a07459f2", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "54\n" - ] - } - ], - "source": [ - "with get_openai_callback() as cb:\n", - " result = llm.invoke(\"Tell me a joke\")\n", - " result2 = llm.invoke(\"Tell me a joke\")\n", - " print(cb.total_tokens)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "c00c9158-7bb4-4279-88e6-ea70f46e6ac2", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tokens Used: 27\n", - "\tPrompt Tokens: 11\n", - "\tCompletion Tokens: 16\n", - "Successful Requests: 1\n", - "Total Cost (USD): $2.95e-05\n" - ] - } - ], - "source": [ - "with get_openai_callback() as cb:\n", - " for chunk in llm.stream(\"Tell me a joke\"):\n", - " pass\n", - " print(cb)" - ] - }, - { - "cell_type": "markdown", - "id": "d8186e7b", - "metadata": {}, - "source": [ - "If a chain or agent with multiple steps in it is used, it will track all those steps." + "Either of these methods will aggregate token usage across multiple calls to each model. For example, you can use it in an [agent](https://python.langchain.com/docs/concepts/agents/) to track token usage across repeated calls to one model:" ] }, { @@ -474,138 +423,63 @@ "metadata": {}, "outputs": [], "source": [ - "%pip install -qU langchain langchain-aws wikipedia" + "%pip install -qU langgraph" ] }, { "cell_type": "code", - "execution_count": 12, - "id": "5d1125c6", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.agents import AgentExecutor, create_tool_calling_agent, load_tools\n", - "from langchain_core.prompts import ChatPromptTemplate\n", - "\n", - "prompt = ChatPromptTemplate.from_messages(\n", - " [\n", - " (\"system\", \"You're a helpful assistant\"),\n", - " (\"human\", \"{input}\"),\n", - " (\"placeholder\", \"{agent_scratchpad}\"),\n", - " ]\n", - ")\n", - "tools = load_tools([\"wikipedia\"])\n", - "agent = create_tool_calling_agent(llm, tools, prompt)\n", - "agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "3950d88b-8bfb-4294-b75b-e6fd421e633c", + "execution_count": 20, + "id": "fe945078-ee2d-43ba-8cdf-afb2f2f4ecef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ + "================================\u001b[1m Human Message \u001b[0m=================================\n", "\n", + "What's the weather in Boston?\n", + "==================================\u001b[1m Ai Message \u001b[0m==================================\n", + "Tool Calls:\n", + " get_weather (call_izMdhUYpp9Vhx7DTNAiybzGa)\n", + " Call ID: call_izMdhUYpp9Vhx7DTNAiybzGa\n", + " Args:\n", + " location: Boston\n", + "=================================\u001b[1m Tool Message \u001b[0m=================================\n", + "Name: get_weather\n", "\n", - "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n", - "\u001b[32;1m\u001b[1;3m\n", - "Invoking: `wikipedia` with `{'query': 'hummingbird scientific name'}`\n", + "It's sunny.\n", + "==================================\u001b[1m Ai Message \u001b[0m==================================\n", "\n", + "The weather in Boston is sunny.\n", "\n", - "\u001b[0m\u001b[36;1m\u001b[1;3mPage: Hummingbird\n", - "Summary: Hummingbirds are birds native to the Americas and comprise the biological family Trochilidae. With approximately 366 species and 113 genera, they occur from Alaska to Tierra del Fuego, but most species are found in Central and South America. As of 2024, 21 hummingbird species are listed as endangered or critically endangered, with numerous species declining in population.\n", - "Hummingbirds have varied specialized characteristics to enable rapid, maneuverable flight: exceptional metabolic capacity, adaptations to high altitude, sensitive visual and communication abilities, and long-distance migration in some species. Among all birds, male hummingbirds have the widest diversity of plumage color, particularly in blues, greens, and purples. Hummingbirds are the smallest mature birds, measuring 7.5–13 cm (3–5 in) in length. The smallest is the 5 cm (2.0 in) bee hummingbird, which weighs less than 2.0 g (0.07 oz), and the largest is the 23 cm (9 in) giant hummingbird, weighing 18–24 grams (0.63–0.85 oz). Noted for long beaks, hummingbirds are specialized for feeding on flower nectar, but all species also consume small insects.\n", - "They are known as hummingbirds because of the humming sound created by their beating wings, which flap at high frequencies audible to other birds and humans. They hover at rapid wing-flapping rates, which vary from around 12 beats per second in the largest species to 80 per second in small hummingbirds.\n", - "Hummingbirds have the highest mass-specific metabolic rate of any homeothermic animal. To conserve energy when food is scarce and at night when not foraging, they can enter torpor, a state similar to hibernation, and slow their metabolic rate to 1⁄15 of its normal rate. While most hummingbirds do not migrate, the rufous hummingbird has one of the longest migrations among birds, traveling twice per year between Alaska and Mexico, a distance of about 3,900 miles (6,300 km).\n", - "Hummingbirds split from their sister group, the swifts and treeswifts, around 42 million years ago. The oldest known fossil hummingbird is Eurotrochilus, from the Rupelian Stage of Early Oligocene Europe.\n", - "\n", - "Page: Rufous hummingbird\n", - "Summary: The rufous hummingbird (Selasphorus rufus) is a small hummingbird, about 8 cm (3.1 in) long with a long, straight and slender bill. These birds are known for their extraordinary flight skills, flying 2,000 mi (3,200 km) during their migratory transits. It is one of nine species in the genus Selasphorus.\n", - "\n", - "\n", - "\n", - "Page: Allen's hummingbird\n", - "Summary: Allen's hummingbird (Selasphorus sasin) is a species of hummingbird that breeds in the western United States. It is one of seven species in the genus Selasphorus.\u001b[0m\u001b[32;1m\u001b[1;3m\n", - "Invoking: `wikipedia` with `{'query': 'fastest bird species'}`\n", - "\n", - "\n", - "\u001b[0m\u001b[36;1m\u001b[1;3mPage: List of birds by flight speed\n", - "Summary: This is a list of the fastest flying birds in the world. A bird's velocity is necessarily variable; a hunting bird will reach much greater speeds while diving to catch prey than when flying horizontally. The bird that can achieve the greatest airspeed is the peregrine falcon (Falco peregrinus), able to exceed 320 km/h (200 mph) in its dives. A close relative of the common swift, the white-throated needletail (Hirundapus caudacutus), is commonly reported as the fastest bird in level flight with a reported top speed of 169 km/h (105 mph). This record remains unconfirmed as the measurement methods have never been published or verified. The record for the fastest confirmed level flight by a bird is 111.5 km/h (69.3 mph) held by the common swift.\n", - "\n", - "Page: Fastest animals\n", - "Summary: This is a list of the fastest animals in the world, by types of animal.\n", - "\n", - "Page: Falcon\n", - "Summary: Falcons () are birds of prey in the genus Falco, which includes about 40 species. Falcons are widely distributed on all continents of the world except Antarctica, though closely related raptors did occur there in the Eocene.\n", - "Adult falcons have thin, tapered wings, which enable them to fly at high speed and change direction rapidly. Fledgling falcons, in their first year of flying, have longer flight feathers, which make their configuration more like that of a general-purpose bird such as a broad wing. This makes flying easier while learning the exceptional skills required to be effective hunters as adults.\n", - "The falcons are the largest genus in the Falconinae subfamily of Falconidae, which itself also includes another subfamily comprising caracaras and a few other species. All these birds kill with their beaks, using a tomial \"tooth\" on the side of their beaks—unlike the hawks, eagles, and other birds of prey in the Accipitridae, which use their feet.\n", - "The largest falcon is the gyrfalcon at up to 65 cm in length. The smallest falcon species is the pygmy falcon, which measures just 20 cm. As with hawks and owls, falcons exhibit sexual dimorphism, with the females typically larger than the males, thus allowing a wider range of prey species.\n", - "Some small falcons with long, narrow wings are called \"hobbies\" and some which hover while hunting are called \"kestrels\".\n", - "As is the case with many birds of prey, falcons have exceptional powers of vision; the visual acuity of one species has been measured at 2.6 times that of a normal human. Peregrine falcons have been recorded diving at speeds of 320 km/h (200 mph), making them the fastest-moving creatures on Earth; the fastest recorded dive attained a vertical speed of 390 km/h (240 mph).\u001b[0m\u001b[32;1m\u001b[1;3mThe scientific name for a hummingbird is Trochilidae. The fastest bird species in level flight is the common swift, which holds the record for the fastest confirmed level flight by a bird at 111.5 km/h (69.3 mph). The peregrine falcon is known to exceed speeds of 320 km/h (200 mph) in its dives, making it the fastest bird in terms of diving speed.\u001b[0m\n", - "\n", - "\u001b[1m> Finished chain.\u001b[0m\n", - "Total Tokens: 1675\n", - "Prompt Tokens: 1538\n", - "Completion Tokens: 137\n", - "Total Cost (USD): $0.0009745000000000001\n" + "Total usage: {'gpt-4o-mini-2024-07-18': {'input_token_details': {'audio': 0, 'cache_read': 0}, 'input_tokens': 125, 'total_tokens': 149, 'output_tokens': 24, 'output_token_details': {'audio': 0, 'reasoning': 0}}}\n" ] } ], "source": [ - "with get_openai_callback() as cb:\n", - " response = agent_executor.invoke(\n", - " {\n", - " \"input\": \"What's a hummingbird's scientific name and what's the fastest bird species?\"\n", - " }\n", - " )\n", - " print(f\"Total Tokens: {cb.total_tokens}\")\n", - " print(f\"Prompt Tokens: {cb.prompt_tokens}\")\n", - " print(f\"Completion Tokens: {cb.completion_tokens}\")\n", - " print(f\"Total Cost (USD): ${cb.total_cost}\")" - ] - }, - { - "cell_type": "markdown", - "id": "ebc9122b-050b-4006-b763-264b0b26d9df", - "metadata": {}, - "source": [ - "### Bedrock Anthropic\n", + "from langgraph.prebuilt import create_react_agent\n", "\n", - "The `get_bedrock_anthropic_callback` works very similarly:" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "1837c807-136a-49d8-9c33-060e58dc16d2", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tokens Used: 96\n", - "\tPrompt Tokens: 26\n", - "\tCompletion Tokens: 70\n", - "Successful Requests: 2\n", - "Total Cost (USD): $0.001888\n" - ] - } - ], - "source": [ - "from langchain_aws import ChatBedrock\n", - "from langchain_community.callbacks.manager import get_bedrock_anthropic_callback\n", "\n", - "llm = ChatBedrock(model_id=\"anthropic.claude-v2\")\n", + "# Create a tool\n", + "def get_weather(location: str) -> str:\n", + " \"\"\"Get the weather at a location.\"\"\"\n", + " return \"It's sunny.\"\n", "\n", - "with get_bedrock_anthropic_callback() as cb:\n", - " result = llm.invoke(\"Tell me a joke\")\n", - " result2 = llm.invoke(\"Tell me a joke\")\n", - " print(cb)" + "\n", + "callback = UsageMetadataCallbackHandler()\n", + "\n", + "tools = [get_weather]\n", + "agent = create_react_agent(\"openai:gpt-4o-mini\", tools)\n", + "for step in agent.stream(\n", + " {\"messages\": [{\"role\": \"user\", \"content\": \"What's the weather in Boston?\"}]},\n", + " stream_mode=\"values\",\n", + " config={\"callbacks\": [callback]},\n", + "):\n", + " step[\"messages\"][-1].pretty_print()\n", + "\n", + "\n", + "print(f\"\\nTotal usage: {callback.usage_metadata}\")" ] }, {