mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-22 14:49:29 +00:00
docs: add how-to on multi-modal tool calling (#21667)
Can move this to a dedicated multi-modal section if desired.
This commit is contained in:
parent
5c64c004cc
commit
12b599c47f
@ -172,6 +172,7 @@ LangChain Tools contain a description of the tool (to pass to the language model
|
|||||||
- [How to: add a human in the loop to tool usage](/docs/how_to/tools_human)
|
- [How to: add a human in the loop to tool usage](/docs/how_to/tools_human)
|
||||||
- [How to: do parallel tool use](/docs/how_to/tools_parallel)
|
- [How to: do parallel tool use](/docs/how_to/tools_parallel)
|
||||||
- [How to: handle errors when calling tools](/docs/how_to/tools_error)
|
- [How to: handle errors when calling tools](/docs/how_to/tools_error)
|
||||||
|
- [How to: call tools using multi-modal data](/docs/how_to/tool_calls_multi_modal)
|
||||||
|
|
||||||
### Agents
|
### Agents
|
||||||
|
|
||||||
|
160
docs/docs/how_to/tool_calls_multi_modal.ipynb
Normal file
160
docs/docs/how_to/tool_calls_multi_modal.ipynb
Normal file
@ -0,0 +1,160 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "4facdf7f-680e-4d28-908b-2b8408e2a741",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# How to call tools with multi-modal data\n",
|
||||||
|
"\n",
|
||||||
|
"Here we demonstrate how to call tools with multi-modal data, such as images.\n",
|
||||||
|
"\n",
|
||||||
|
"Some multi-modal models, such as those that can reason over images or audio, support [tool calling](/docs/concepts/#functiontool-calling) features as well.\n",
|
||||||
|
"\n",
|
||||||
|
"To call tools using such models, simply bind tools to them in the [usual way](/docs/how_to/tool_calling), and invoke the model using content blocks of the desired type (e.g., containing image data).\n",
|
||||||
|
"\n",
|
||||||
|
"Below, we demonstrate examples using [OpenAI](/docs/integrations/platforms/openai) and [Anthropic](/docs/integrations/platforms/anthropic). We will use the same image and tool in all cases. Let's first select an image, and build a placeholder tool that expects as input the string \"sunny\", \"cloudy\", or \"rainy\". We will ask the models to describe the weather in the image."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"id": "0d9fd81a-b7f0-445a-8e3d-cfc2d31fdd59",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from typing import Literal\n",
|
||||||
|
"\n",
|
||||||
|
"from langchain_core.tools import tool\n",
|
||||||
|
"\n",
|
||||||
|
"image_url = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\"\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"@tool\n",
|
||||||
|
"def weather_tool(weather: Literal[\"sunny\", \"cloudy\", \"rainy\"]) -> None:\n",
|
||||||
|
" \"\"\"Describe the weather\"\"\"\n",
|
||||||
|
" pass"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "8656018e-c56d-47d2-b2be-71e87827f90a",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## OpenAI\n",
|
||||||
|
"\n",
|
||||||
|
"For OpenAI, we can feed the image URL directly in a content block of type \"image_url\":"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 3,
|
||||||
|
"id": "a8819cf3-5ddc-44f0-889a-19ca7b7fe77e",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'call_mRYL50MtHdeNuNIjSCm5UPmB'}]\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"from langchain_core.messages import HumanMessage\n",
|
||||||
|
"from langchain_openai import ChatOpenAI\n",
|
||||||
|
"\n",
|
||||||
|
"model = ChatOpenAI(model=\"gpt-4o\").bind_tools([weather_tool])\n",
|
||||||
|
"\n",
|
||||||
|
"message = HumanMessage(\n",
|
||||||
|
" content=[\n",
|
||||||
|
" {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n",
|
||||||
|
" {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n",
|
||||||
|
" ],\n",
|
||||||
|
")\n",
|
||||||
|
"response = model.invoke([message])\n",
|
||||||
|
"print(response.tool_calls)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "e5738224-1109-4bf8-8976-ff1570dd1d46",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Note that we recover tool calls with parsed arguments in LangChain's [standard format](/docs/how_to/tool_calling) in the model response."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "0cee63ff-e09f-4dd8-8323-912edbde94f6",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Anthropic\n",
|
||||||
|
"\n",
|
||||||
|
"For Anthropic, we can format a base64-encoded image into a content block of type \"image\", as below:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 5,
|
||||||
|
"id": "d90c4590-71c8-42b1-99ff-03a9eca8082e",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'toolu_016m9KfknJqx5fVRYk4tkF6s'}]\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"import base64\n",
|
||||||
|
"\n",
|
||||||
|
"import httpx\n",
|
||||||
|
"from langchain_anthropic import ChatAnthropic\n",
|
||||||
|
"\n",
|
||||||
|
"image_data = base64.b64encode(httpx.get(image_url).content).decode(\"utf-8\")\n",
|
||||||
|
"\n",
|
||||||
|
"model = ChatAnthropic(model=\"claude-3-sonnet-20240229\").bind_tools([weather_tool])\n",
|
||||||
|
"\n",
|
||||||
|
"message = HumanMessage(\n",
|
||||||
|
" content=[\n",
|
||||||
|
" {\"type\": \"text\", \"text\": \"describe the weather in this image\"},\n",
|
||||||
|
" {\n",
|
||||||
|
" \"type\": \"image\",\n",
|
||||||
|
" \"source\": {\n",
|
||||||
|
" \"type\": \"base64\",\n",
|
||||||
|
" \"media_type\": \"image/jpeg\",\n",
|
||||||
|
" \"data\": image_data,\n",
|
||||||
|
" },\n",
|
||||||
|
" },\n",
|
||||||
|
" ],\n",
|
||||||
|
")\n",
|
||||||
|
"response = model.invoke([message])\n",
|
||||||
|
"print(response.tool_calls)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.10.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
Loading…
Reference in New Issue
Block a user