anthropic: support built-in tools, improve docs (#30274)

- Support features from recent update: https://www.anthropic.com/news/token-saving-updates (mostly adding support for built-in tools in `bind_tools` - Add documentation around prompt caching, token-efficient tool use, and built-in tools.
2025-09-02 19:47:13 +00:00 · 2025-03-14 12:18:50 -04:00
parent f27e2d7ce7
commit 226f29bc96
7 changed files with 629 additions and 30 deletions
--- a/docs/docs/integrations/chat/anthropic.ipynb
+++ b/docs/docs/integrations/chat/anthropic.ipynb
@@ -102,6 +102,16 @@
    "%pip install -qU langchain-anthropic"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "fe4993ad-4a9b-4021-8ebd-f0fbbc739f49",
+   "metadata": {},
+   "source": [
+    ":::info This guide requires ``langchain-anthropic>=0.3.10``\n",
+    "\n",
+    ":::"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "a38cde65-254d-4219-a441-068766c0d4b5",
@@ -245,7 +255,7 @@
   "source": [
    "## Content blocks\n",
    "\n",
-    "One key difference to note between Anthropic models and most others is that the contents of a single Anthropic AI message can either be a single string or a **list of content blocks**. For example when an Anthropic model invokes a tool, the tool invocation is part of the message content (as well as being exposed in the standardized `AIMessage.tool_calls`):"
+    "Content from a single Anthropic AI message can either be a single string or a **list of content blocks**. For example when an Anthropic model invokes a tool, the tool invocation is part of the message content (as well as being exposed in the standardized `AIMessage.tool_calls`):"
   ]
  },
  {
@@ -368,6 +378,377 @@
    "print(json.dumps(response.content, indent=2))"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "34349dfe-5d81-4887-a4f4-cd01e9587cdc",
+   "metadata": {},
+   "source": [
+    "## Prompt caching\n",
+    "\n",
+    "Anthropic supports [caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) of [elements of your prompts](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached), including messages, tool definitions, tool results, images and documents. This allows you to re-use large documents, instructions, [few-shot documents](/docs/concepts/few_shot_prompting/), and other data to reduce latency and costs.\n",
+    "\n",
+    "To enable caching on an element of a prompt, mark its associated content block using the `cache_control` key. See examples below:\n",
+    "\n",
+    "### Messages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "babb44a5-33f7-4200-9dfc-be867cf2c217",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "First invocation:\n",
+      "{'cache_read': 0, 'cache_creation': 1458}\n",
+      "\n",
+      "Second:\n",
+      "{'cache_read': 1458, 'cache_creation': 0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "import requests\n",
+    "from langchain_anthropic import ChatAnthropic\n",
+    "\n",
+    "llm = ChatAnthropic(model=\"claude-3-7-sonnet-20250219\")\n",
+    "\n",
+    "# Pull LangChain readme\n",
+    "get_response = requests.get(\n",
+    "    \"https://raw.githubusercontent.com/langchain-ai/langchain/master/README.md\"\n",
+    ")\n",
+    "readme = get_response.text\n",
+    "\n",
+    "messages = [\n",
+    "    {\n",
+    "        \"role\": \"system\",\n",
+    "        \"content\": [\n",
+    "            {\n",
+    "                \"type\": \"text\",\n",
+    "                \"text\": \"You are a technology expert.\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"type\": \"text\",\n",
+    "                \"text\": f\"{readme}\",\n",
+    "                # highlight-next-line\n",
+    "                \"cache_control\": {\"type\": \"ephemeral\"},\n",
+    "            },\n",
+    "        ],\n",
+    "    },\n",
+    "    {\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": \"What's LangChain, according to its README?\",\n",
+    "    },\n",
+    "]\n",
+    "\n",
+    "response_1 = llm.invoke(messages)\n",
+    "response_2 = llm.invoke(messages)\n",
+    "\n",
+    "usage_1 = response_1.usage_metadata[\"input_token_details\"]\n",
+    "usage_2 = response_2.usage_metadata[\"input_token_details\"]\n",
+    "\n",
+    "print(f\"First invocation:\\n{usage_1}\")\n",
+    "print(f\"\\nSecond:\\n{usage_2}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "141ce9c5-012d-4502-9d61-4a413b5d959a",
+   "metadata": {},
+   "source": [
+    "### Tools"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "1de82015-810f-4ed4-a08b-9866ea8746ce",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "First invocation:\n",
+      "{'cache_read': 0, 'cache_creation': 1809}\n",
+      "\n",
+      "Second:\n",
+      "{'cache_read': 1809, 'cache_creation': 0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_anthropic import convert_to_anthropic_tool\n",
+    "from langchain_core.tools import tool\n",
+    "\n",
+    "# For demonstration purposes, we artificially expand the\n",
+    "# tool description.\n",
+    "description = (\n",
+    "    f\"Get the weather at a location. By the way, check out this readme: {readme}\"\n",
+    ")\n",
+    "\n",
+    "\n",
+    "@tool(description=description)\n",
+    "def get_weather(location: str) -> str:\n",
+    "    return \"It's sunny.\"\n",
+    "\n",
+    "\n",
+    "# Enable caching on the tool\n",
+    "# highlight-start\n",
+    "weather_tool = convert_to_anthropic_tool(get_weather)\n",
+    "weather_tool[\"cache_control\"] = {\"type\": \"ephemeral\"}\n",
+    "# highlight-end\n",
+    "\n",
+    "llm = ChatAnthropic(model=\"claude-3-7-sonnet-20250219\")\n",
+    "llm_with_tools = llm.bind_tools([weather_tool])\n",
+    "query = \"What's the weather in San Francisco?\"\n",
+    "\n",
+    "response_1 = llm_with_tools.invoke(query)\n",
+    "response_2 = llm_with_tools.invoke(query)\n",
+    "\n",
+    "usage_1 = response_1.usage_metadata[\"input_token_details\"]\n",
+    "usage_2 = response_2.usage_metadata[\"input_token_details\"]\n",
+    "\n",
+    "print(f\"First invocation:\\n{usage_1}\")\n",
+    "print(f\"\\nSecond:\\n{usage_2}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a763830d-82cb-448a-ab30-f561522791b9",
+   "metadata": {},
+   "source": [
+    "### Incremental caching in conversational applications\n",
+    "\n",
+    "Prompt caching can be used in [multi-turn conversations](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#continuing-a-multi-turn-conversation) to maintain context from earlier messages without redundant processing.\n",
+    "\n",
+    "We can enable incremental caching by marking the final message with `cache_control`. Claude will automatically use the longest previously-cacched prefix for follow-up messages.\n",
+    "\n",
+    "Below, we implement a simple chatbot that incorporates this feature. We follow the LangChain [chatbot tutorial](/docs/tutorials/chatbot/), but add a custom [reducer](https://langchain-ai.github.io/langgraph/concepts/low_level/#reducers) that automatically marks the last content block in each user message with `cache_control`. See below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "07fde4db-344c-49bc-a5b4-99e2d20fb394",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "from langchain_anthropic import ChatAnthropic\n",
+    "from langgraph.checkpoint.memory import MemorySaver\n",
+    "from langgraph.graph import START, StateGraph, add_messages\n",
+    "from typing_extensions import Annotated, TypedDict\n",
+    "\n",
+    "llm = ChatAnthropic(model=\"claude-3-7-sonnet-20250219\")\n",
+    "\n",
+    "# Pull LangChain readme\n",
+    "get_response = requests.get(\n",
+    "    \"https://raw.githubusercontent.com/langchain-ai/langchain/master/README.md\"\n",
+    ")\n",
+    "readme = get_response.text\n",
+    "\n",
+    "\n",
+    "def messages_reducer(left: list, right: list) -> list:\n",
+    "    # Update last user message\n",
+    "    for i in range(len(right) - 1, -1, -1):\n",
+    "        if right[i].type == \"human\":\n",
+    "            right[i].content[-1][\"cache_control\"] = {\"type\": \"ephemeral\"}\n",
+    "            break\n",
+    "\n",
+    "    return add_messages(left, right)\n",
+    "\n",
+    "\n",
+    "class State(TypedDict):\n",
+    "    messages: Annotated[list, messages_reducer]\n",
+    "\n",
+    "\n",
+    "workflow = StateGraph(state_schema=State)\n",
+    "\n",
+    "\n",
+    "# Define the function that calls the model\n",
+    "def call_model(state: State):\n",
+    "    response = llm.invoke(state[\"messages\"])\n",
+    "    return {\"messages\": [response]}\n",
+    "\n",
+    "\n",
+    "# Define the (single) node in the graph\n",
+    "workflow.add_edge(START, \"model\")\n",
+    "workflow.add_node(\"model\", call_model)\n",
+    "\n",
+    "# Add memory\n",
+    "memory = MemorySaver()\n",
+    "app = workflow.compile(checkpointer=memory)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "40013035-eb22-4327-8aaf-1ee974d9ff46",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
+      "\n",
+      "Hello, Bob! It's nice to meet you. How are you doing today? Is there something I can help you with?\n",
+      "\n",
+      "{'cache_read': 0, 'cache_creation': 0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_core.messages import HumanMessage\n",
+    "\n",
+    "config = {\"configurable\": {\"thread_id\": \"abc123\"}}\n",
+    "\n",
+    "query = \"Hi! I'm Bob.\"\n",
+    "\n",
+    "input_message = HumanMessage([{\"type\": \"text\", \"text\": query}])\n",
+    "output = app.invoke({\"messages\": [input_message]}, config)\n",
+    "output[\"messages\"][-1].pretty_print()\n",
+    "print(f'\\n{output[\"messages\"][-1].usage_metadata[\"input_token_details\"]}')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "22371f68-7913-4c4f-ab4a-2b4265095469",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
+      "\n",
+      "I can see you've shared the README from the LangChain GitHub repository. This is the documentation for LangChain, which is a popular framework for building applications powered by Large Language Models (LLMs). Here's a summary of what the README contains:\n",
+      "\n",
+      "LangChain is:\n",
+      "- A framework for developing LLM-powered applications\n",
+      "- Helps chain together components and integrations to simplify AI application development\n",
+      "- Provides a standard interface for models, embeddings, vector stores, etc.\n",
+      "\n",
+      "Key features/benefits:\n",
+      "- Real-time data augmentation (connect LLMs to diverse data sources)\n",
+      "- Model interoperability (swap models easily as needed)\n",
+      "- Large ecosystem of integrations\n",
+      "\n",
+      "The LangChain ecosystem includes:\n",
+      "- LangSmith - For evaluations and observability\n",
+      "- LangGraph - For building complex agents with customizable architecture\n",
+      "- LangGraph Platform - For deployment and scaling of agents\n",
+      "\n",
+      "The README also mentions installation instructions (`pip install -U langchain`) and links to various resources including tutorials, how-to guides, conceptual guides, and API references.\n",
+      "\n",
+      "Is there anything specific about LangChain you'd like to know more about, Bob?\n",
+      "\n",
+      "{'cache_read': 0, 'cache_creation': 1498}\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = f\"Check out this readme: {readme}\"\n",
+    "\n",
+    "input_message = HumanMessage([{\"type\": \"text\", \"text\": query}])\n",
+    "output = app.invoke({\"messages\": [input_message]}, config)\n",
+    "output[\"messages\"][-1].pretty_print()\n",
+    "print(f'\\n{output[\"messages\"][-1].usage_metadata[\"input_token_details\"]}')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "0e6798fc-8a80-4324-b4e3-f18706256c61",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
+      "\n",
+      "Your name is Bob. You introduced yourself at the beginning of our conversation.\n",
+      "\n",
+      "{'cache_read': 1498, 'cache_creation': 269}\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = \"What was my name again?\"\n",
+    "\n",
+    "input_message = HumanMessage([{\"type\": \"text\", \"text\": query}])\n",
+    "output = app.invoke({\"messages\": [input_message]}, config)\n",
+    "output[\"messages\"][-1].pretty_print()\n",
+    "print(f'\\n{output[\"messages\"][-1].usage_metadata[\"input_token_details\"]}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa4b3647-c672-4782-a88c-a55fd3bf969f",
+   "metadata": {},
+   "source": [
+    "In the [LangSmith trace](https://smith.langchain.com/public/4d0584d8-5f9e-4b91-8704-93ba2ccf416a/r), toggling \"raw output\" will show exactly what messages are sent to the chat model, including `cache_control` keys."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "029009f2-2795-418b-b5fc-fb996c6fe99e",
+   "metadata": {},
+   "source": [
+    "## Token-efficient tool use\n",
+    "\n",
+    "Anthropic supports a (beta) [token-efficient tool use](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/token-efficient-tool-use) feature. To use it, specify the relevant beta-headers when instantiating the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "206cff65-33b8-4a88-9b1a-050b4d57772a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[{'name': 'get_weather', 'args': {'location': 'San Francisco'}, 'id': 'toolu_01EoeE1qYaePcmNbUvMsWtmA', 'type': 'tool_call'}]\n",
+      "\n",
+      "Total tokens: 408\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_anthropic import ChatAnthropic\n",
+    "from langchain_core.tools import tool\n",
+    "\n",
+    "llm = ChatAnthropic(\n",
+    "    model=\"claude-3-7-sonnet-20250219\",\n",
+    "    temperature=0,\n",
+    "    # highlight-start\n",
+    "    model_kwargs={\n",
+    "        \"extra_headers\": {\"anthropic-beta\": \"token-efficient-tools-2025-02-19\"}\n",
+    "    },\n",
+    "    # highlight-end\n",
+    ")\n",
+    "\n",
+    "\n",
+    "@tool\n",
+    "def get_weather(location: str) -> str:\n",
+    "    \"\"\"Get the weather at a location.\"\"\"\n",
+    "    return \"It's sunny.\"\n",
+    "\n",
+    "\n",
+    "llm_with_tools = llm.bind_tools([get_weather])\n",
+    "response = llm_with_tools.invoke(\"What's the weather in San Francisco?\")\n",
+    "print(response.tool_calls)\n",
+    "print(f'\\nTotal tokens: {response.usage_metadata[\"total_tokens\"]}')"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "301d372f-4dec-43e6-b58c-eee25633e1a6",
@@ -525,6 +906,58 @@
    "response.content"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "cbfec7a9-d9df-4d12-844e-d922456dd9bf",
+   "metadata": {},
+   "source": [
+    "## Built-in tools\n",
+    "\n",
+    "Anthropic supports a variety of [built-in tools](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/text-editor-tool), which can be bound to the model in the [usual way](/docs/how_to/tool_calling/). Claude will generate tool calls adhering to its internal schema for the tool:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "30a0af36-2327-4b1d-9ba5-e47cb72db0be",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "I'd be happy to help you fix the syntax error in your primes.py file. First, let's look at the current content of the file to identify the error.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[{'name': 'str_replace_editor',\n",
+       "  'args': {'command': 'view', 'path': '/repo/primes.py'},\n",
+       "  'id': 'toolu_01VdNgt1YV7kGfj9LFLm6HyQ',\n",
+       "  'type': 'tool_call'}]"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_anthropic import ChatAnthropic\n",
+    "\n",
+    "llm = ChatAnthropic(model=\"claude-3-7-sonnet-20250219\")\n",
+    "\n",
+    "tool = {\"type\": \"text_editor_20250124\", \"name\": \"str_replace_editor\"}\n",
+    "llm_with_tools = llm.bind_tools([tool])\n",
+    "\n",
+    "response = llm_with_tools.invoke(\n",
+    "    \"There's a syntax error in my primes.py file. Can you help me fix it?\"\n",
+    ")\n",
+    "print(response.text())\n",
+    "response.tool_calls"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",