docs: update multimodal PDF and image usage for gpt-4.1 (#31595)

docs: update multimodal PDF and image usage for gpt-4.1

**Description:**
This update revises the LangChain documentation to support the new
GPT-4.1 multimodal API format. It fixes the previous broken example for
PDF uploads (which returned a 400 error: "Missing required parameter:
'messages[0].content[1].file'") and adds clear instructions on how to
include base64-encoded images for OpenAI models.

**Issue:**
error appointed in foruns for pdf load into api ->
'''
@[Albaeld](https://github.com/Albaeld)
Albaeld
[8 days
ago](https://github.com/langchain-ai/langchain/discussions/27702#discussioncomment-13369460)
This simply does not work with openai:gpt-4.1. I get:
Error code: 400 - {'error': {'message': "Missing required parameter:
'messages[0].content[1].file'.", 'type': 'invalid_request_error',
'param': 'messages[0].content[1].file', 'code':
'missing_required_parameter'}}
'''

**Dependencies:**
None

**Twitter handle:**
N/A

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
dayvidborges 2025-06-14 18:52:01 -03:00 committed by GitHub
parent cecfec5efa
commit 52e57cdc20
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 120 additions and 57 deletions

View File

@ -212,6 +212,10 @@
"[Anthropic](/docs/integrations/chat/anthropic/), and\n",
"[Google Gemini](/docs/integrations/chat/google_generative_ai/)) will accept PDF documents.\n",
"\n",
":::note\n",
"OpenAI requires file-names be specified for PDF inputs. When using LangChain's format, include the `filename` key. See [example below](#example-openai-file-names).\n",
":::\n",
"\n",
"### Documents from base64 data\n",
"\n",
"To pass documents in-line, format them as content blocks of the following form:\n",

View File

@ -1463,74 +1463,133 @@
"id": "5d5d9793",
"metadata": {},
"source": [
"## Multimodal Inputs\n",
"## Multimodal Inputs (images, PDFs, audio)\n",
"\n",
"OpenAI has models that support multimodal inputs. You can pass in images or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/docs/how_to/multimodal_inputs) docs.\n",
"OpenAI has models that support multimodal inputs. You can pass in images, PDFs, or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/docs/how_to/multimodal_inputs) docs.\n",
"\n",
"You can see the list of models that support different modalities in [OpenAI's documentation](https://platform.openai.com/docs/models).\n",
"\n",
"At the time of this doc's writing, the main OpenAI models you would use would be:\n",
"For all modalities, LangChain supports both its [cross-provider standard](/docs/concepts/multimodality/#multimodality-in-chat-models) as well as OpenAI's native content-block format.\n",
"\n",
"- Image inputs: `gpt-4o`, `gpt-4o-mini`\n",
"- Audio inputs: `gpt-4o-audio-preview`\n",
"To pass multimodal data into `ChatOpenAI`, create a [content block](/docs/concepts/messages/) containing the data and incorporate it into a message, e.g., as below:\n",
"```python\n",
"message = {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\n",
" \"type\": \"text\",\n",
" # Update prompt as desired\n",
" \"text\": \"Describe the (image / PDF / audio...)\",\n",
" },\n",
" # highlight-next-line\n",
" content_block,\n",
" ],\n",
"}\n",
"```\n",
"See below for examples of content blocks.\n",
"\n",
"For an example of passing in image inputs, see the [multimodal inputs how-to guide](/docs/how_to/multimodal_inputs).\n",
"<details>\n",
"<summary>Images</summary>\n",
"\n",
"Below is an example of passing audio inputs to `gpt-4o-audio-preview`:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "39d08780",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"I'm sorry, but I can't create audio content that involves yelling. Is there anything else I can help you with?\""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import base64\n",
"Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#images).\n",
"\n",
"from langchain_openai import ChatOpenAI\n",
"URLs:\n",
"```python\n",
"# LangChain format\n",
"content_block = {\n",
" \"type\": \"image\",\n",
" \"source_type\": \"url\",\n",
" \"url\": url_string,\n",
"}\n",
"\n",
"llm = ChatOpenAI(\n",
" model=\"gpt-4o-audio-preview\",\n",
" temperature=0,\n",
")\n",
"# OpenAI Chat Completions format\n",
"content_block = {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\"url\": url_string},\n",
"}\n",
"```\n",
"\n",
"with open(\n",
" \"../../../../libs/partners/openai/tests/integration_tests/chat_models/audio_input.wav\",\n",
" \"rb\",\n",
") as f:\n",
" # b64 encode it\n",
" audio = f.read()\n",
" audio_b64 = base64.b64encode(audio).decode()\n",
"In-line base64 data:\n",
"```python\n",
"# LangChain format\n",
"content_block = {\n",
" \"type\": \"image\",\n",
" \"source_type\": \"base64\",\n",
" \"data\": base64_string,\n",
" \"mime_type\": \"image/jpeg\",\n",
"}\n",
"\n",
"# OpenAI Chat Completions format\n",
"content_block = {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\n",
" \"url\": f\"data:image/jpeg;base64,{base64_string}\",\n",
" },\n",
"}\n",
"```\n",
"\n",
"</details>\n",
"\n",
"\n",
"output_message = llm.invoke(\n",
" [\n",
" (\n",
" \"human\",\n",
" [\n",
" {\"type\": \"text\", \"text\": \"Transcribe the following:\"},\n",
" # the audio clip says \"I'm sorry, but I can't create...\"\n",
" {\n",
" \"type\": \"input_audio\",\n",
" \"input_audio\": {\"data\": audio_b64, \"format\": \"wav\"},\n",
" },\n",
" ],\n",
" ),\n",
" ]\n",
")\n",
"output_message.content"
"<details>\n",
"<summary>PDFs</summary>\n",
"\n",
"Note: OpenAI requires file-names be specified for PDF inputs. When using LangChain's format, include the `filename` key.\n",
"\n",
"Read more [here](/docs/how_to/multimodal_inputs/#example-openai-file-names).\n",
"\n",
"Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#documents-pdf).\n",
"\n",
"In-line base64 data:\n",
"```python\n",
"# LangChain format\n",
"content_block = {\n",
" \"type\": \"file\",\n",
" \"source_type\": \"base64\",\n",
" \"data\": base64_string,\n",
" \"mime_type\": \"application/pdf\",\n",
" # highlight-next-line\n",
" \"filename\": \"my-file.pdf\",\n",
"}\n",
"\n",
"# OpenAI Chat Completions format\n",
"content_block = {\n",
" \"type\": \"file\",\n",
" \"file\": {\n",
" \"filename\": \"my-file.pdf\",\n",
" \"file_data\": f\"data:application/pdf;base64,{base64_string}\",\n",
" }\n",
"}\n",
"```\n",
"\n",
"</details>\n",
"\n",
"\n",
"<details>\n",
"<summary>Audio</summary>\n",
"\n",
"See [supported models](https://platform.openai.com/docs/models), e.g., `\"gpt-4o-audio-preview\"`.\n",
"\n",
"Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#audio).\n",
"\n",
"In-line base64 data:\n",
"```python\n",
"# LangChain format\n",
"content_block = {\n",
" \"type\": \"audio\",\n",
" \"source_type\": \"base64\",\n",
" \"mime_type\": \"audio/wav\", # or appropriate mime-type\n",
" \"data\": base64_string,\n",
"}\n",
"\n",
"# OpenAI Chat Completions format\n",
"content_block = {\n",
" \"type\": \"input_audio\",\n",
" \"input_audio\": {\"data\": base64_string, \"format\": \"wav\"},\n",
"}\n",
"```\n",
"\n",
"</details>"
]
},
{
@ -1751,7 +1810,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.4"
}
},
"nbformat": 4,