mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-23 15:19:33 +00:00
docs: update multimodal PDF and image usage for gpt-4.1 (#31595)
docs: update multimodal PDF and image usage for gpt-4.1 **Description:** This update revises the LangChain documentation to support the new GPT-4.1 multimodal API format. It fixes the previous broken example for PDF uploads (which returned a 400 error: "Missing required parameter: 'messages[0].content[1].file'") and adds clear instructions on how to include base64-encoded images for OpenAI models. **Issue:** error appointed in foruns for pdf load into api -> ''' @[Albaeld](https://github.com/Albaeld) Albaeld [8 days ago](https://github.com/langchain-ai/langchain/discussions/27702#discussioncomment-13369460) This simply does not work with openai:gpt-4.1. I get: Error code: 400 - {'error': {'message': "Missing required parameter: 'messages[0].content[1].file'.", 'type': 'invalid_request_error', 'param': 'messages[0].content[1].file', 'code': 'missing_required_parameter'}} ''' **Dependencies:** None **Twitter handle:** N/A --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
parent
cecfec5efa
commit
52e57cdc20
@ -212,6 +212,10 @@
|
||||
"[Anthropic](/docs/integrations/chat/anthropic/), and\n",
|
||||
"[Google Gemini](/docs/integrations/chat/google_generative_ai/)) will accept PDF documents.\n",
|
||||
"\n",
|
||||
":::note\n",
|
||||
"OpenAI requires file-names be specified for PDF inputs. When using LangChain's format, include the `filename` key. See [example below](#example-openai-file-names).\n",
|
||||
":::\n",
|
||||
"\n",
|
||||
"### Documents from base64 data\n",
|
||||
"\n",
|
||||
"To pass documents in-line, format them as content blocks of the following form:\n",
|
||||
|
@ -1463,74 +1463,133 @@
|
||||
"id": "5d5d9793",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Multimodal Inputs\n",
|
||||
"## Multimodal Inputs (images, PDFs, audio)\n",
|
||||
"\n",
|
||||
"OpenAI has models that support multimodal inputs. You can pass in images or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/docs/how_to/multimodal_inputs) docs.\n",
|
||||
"OpenAI has models that support multimodal inputs. You can pass in images, PDFs, or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/docs/how_to/multimodal_inputs) docs.\n",
|
||||
"\n",
|
||||
"You can see the list of models that support different modalities in [OpenAI's documentation](https://platform.openai.com/docs/models).\n",
|
||||
"\n",
|
||||
"At the time of this doc's writing, the main OpenAI models you would use would be:\n",
|
||||
"For all modalities, LangChain supports both its [cross-provider standard](/docs/concepts/multimodality/#multimodality-in-chat-models) as well as OpenAI's native content-block format.\n",
|
||||
"\n",
|
||||
"- Image inputs: `gpt-4o`, `gpt-4o-mini`\n",
|
||||
"- Audio inputs: `gpt-4o-audio-preview`\n",
|
||||
"To pass multimodal data into `ChatOpenAI`, create a [content block](/docs/concepts/messages/) containing the data and incorporate it into a message, e.g., as below:\n",
|
||||
"```python\n",
|
||||
"message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": [\n",
|
||||
" {\n",
|
||||
" \"type\": \"text\",\n",
|
||||
" # Update prompt as desired\n",
|
||||
" \"text\": \"Describe the (image / PDF / audio...)\",\n",
|
||||
" },\n",
|
||||
" # highlight-next-line\n",
|
||||
" content_block,\n",
|
||||
" ],\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"See below for examples of content blocks.\n",
|
||||
"\n",
|
||||
"For an example of passing in image inputs, see the [multimodal inputs how-to guide](/docs/how_to/multimodal_inputs).\n",
|
||||
"<details>\n",
|
||||
"<summary>Images</summary>\n",
|
||||
"\n",
|
||||
"Below is an example of passing audio inputs to `gpt-4o-audio-preview`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "39d08780",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"I'm sorry, but I can't create audio content that involves yelling. Is there anything else I can help you with?\""
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import base64\n",
|
||||
"Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#images).\n",
|
||||
"\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"URLs:\n",
|
||||
"```python\n",
|
||||
"# LangChain format\n",
|
||||
"content_block = {\n",
|
||||
" \"type\": \"image\",\n",
|
||||
" \"source_type\": \"url\",\n",
|
||||
" \"url\": url_string,\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(\n",
|
||||
" model=\"gpt-4o-audio-preview\",\n",
|
||||
" temperature=0,\n",
|
||||
")\n",
|
||||
"# OpenAI Chat Completions format\n",
|
||||
"content_block = {\n",
|
||||
" \"type\": \"image_url\",\n",
|
||||
" \"image_url\": {\"url\": url_string},\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"with open(\n",
|
||||
" \"../../../../libs/partners/openai/tests/integration_tests/chat_models/audio_input.wav\",\n",
|
||||
" \"rb\",\n",
|
||||
") as f:\n",
|
||||
" # b64 encode it\n",
|
||||
" audio = f.read()\n",
|
||||
" audio_b64 = base64.b64encode(audio).decode()\n",
|
||||
"In-line base64 data:\n",
|
||||
"```python\n",
|
||||
"# LangChain format\n",
|
||||
"content_block = {\n",
|
||||
" \"type\": \"image\",\n",
|
||||
" \"source_type\": \"base64\",\n",
|
||||
" \"data\": base64_string,\n",
|
||||
" \"mime_type\": \"image/jpeg\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# OpenAI Chat Completions format\n",
|
||||
"content_block = {\n",
|
||||
" \"type\": \"image_url\",\n",
|
||||
" \"image_url\": {\n",
|
||||
" \"url\": f\"data:image/jpeg;base64,{base64_string}\",\n",
|
||||
" },\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"</details>\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"output_message = llm.invoke(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"human\",\n",
|
||||
" [\n",
|
||||
" {\"type\": \"text\", \"text\": \"Transcribe the following:\"},\n",
|
||||
" # the audio clip says \"I'm sorry, but I can't create...\"\n",
|
||||
" {\n",
|
||||
" \"type\": \"input_audio\",\n",
|
||||
" \"input_audio\": {\"data\": audio_b64, \"format\": \"wav\"},\n",
|
||||
" },\n",
|
||||
" ],\n",
|
||||
" ),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"output_message.content"
|
||||
"<details>\n",
|
||||
"<summary>PDFs</summary>\n",
|
||||
"\n",
|
||||
"Note: OpenAI requires file-names be specified for PDF inputs. When using LangChain's format, include the `filename` key.\n",
|
||||
"\n",
|
||||
"Read more [here](/docs/how_to/multimodal_inputs/#example-openai-file-names).\n",
|
||||
"\n",
|
||||
"Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#documents-pdf).\n",
|
||||
"\n",
|
||||
"In-line base64 data:\n",
|
||||
"```python\n",
|
||||
"# LangChain format\n",
|
||||
"content_block = {\n",
|
||||
" \"type\": \"file\",\n",
|
||||
" \"source_type\": \"base64\",\n",
|
||||
" \"data\": base64_string,\n",
|
||||
" \"mime_type\": \"application/pdf\",\n",
|
||||
" # highlight-next-line\n",
|
||||
" \"filename\": \"my-file.pdf\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# OpenAI Chat Completions format\n",
|
||||
"content_block = {\n",
|
||||
" \"type\": \"file\",\n",
|
||||
" \"file\": {\n",
|
||||
" \"filename\": \"my-file.pdf\",\n",
|
||||
" \"file_data\": f\"data:application/pdf;base64,{base64_string}\",\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"</details>\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"<details>\n",
|
||||
"<summary>Audio</summary>\n",
|
||||
"\n",
|
||||
"See [supported models](https://platform.openai.com/docs/models), e.g., `\"gpt-4o-audio-preview\"`.\n",
|
||||
"\n",
|
||||
"Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#audio).\n",
|
||||
"\n",
|
||||
"In-line base64 data:\n",
|
||||
"```python\n",
|
||||
"# LangChain format\n",
|
||||
"content_block = {\n",
|
||||
" \"type\": \"audio\",\n",
|
||||
" \"source_type\": \"base64\",\n",
|
||||
" \"mime_type\": \"audio/wav\", # or appropriate mime-type\n",
|
||||
" \"data\": base64_string,\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# OpenAI Chat Completions format\n",
|
||||
"content_block = {\n",
|
||||
" \"type\": \"input_audio\",\n",
|
||||
" \"input_audio\": {\"data\": base64_string, \"format\": \"wav\"},\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"</details>"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1751,7 +1810,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
Loading…
Reference in New Issue
Block a user