From 52e57cdc20a0c704842a34aef53fbb2256bba71e Mon Sep 17 00:00:00 2001 From: dayvidborges <50924879+dayvidborges@users.noreply.github.com> Date: Sat, 14 Jun 2025 18:52:01 -0300 Subject: [PATCH] docs: update multimodal PDF and image usage for gpt-4.1 (#31595) docs: update multimodal PDF and image usage for gpt-4.1 **Description:** This update revises the LangChain documentation to support the new GPT-4.1 multimodal API format. It fixes the previous broken example for PDF uploads (which returned a 400 error: "Missing required parameter: 'messages[0].content[1].file'") and adds clear instructions on how to include base64-encoded images for OpenAI models. **Issue:** error appointed in foruns for pdf load into api -> ''' @[Albaeld](https://github.com/Albaeld) Albaeld [8 days ago](https://github.com/langchain-ai/langchain/discussions/27702#discussioncomment-13369460) This simply does not work with openai:gpt-4.1. I get: Error code: 400 - {'error': {'message': "Missing required parameter: 'messages[0].content[1].file'.", 'type': 'invalid_request_error', 'param': 'messages[0].content[1].file', 'code': 'missing_required_parameter'}} ''' **Dependencies:** None **Twitter handle:** N/A --------- Co-authored-by: Chester Curme --- docs/docs/how_to/multimodal_inputs.ipynb | 4 + docs/docs/integrations/chat/openai.ipynb | 173 +++++++++++++++-------- 2 files changed, 120 insertions(+), 57 deletions(-) diff --git a/docs/docs/how_to/multimodal_inputs.ipynb b/docs/docs/how_to/multimodal_inputs.ipynb index 6662faf9fba..96da487ed35 100644 --- a/docs/docs/how_to/multimodal_inputs.ipynb +++ b/docs/docs/how_to/multimodal_inputs.ipynb @@ -212,6 +212,10 @@ "[Anthropic](/docs/integrations/chat/anthropic/), and\n", "[Google Gemini](/docs/integrations/chat/google_generative_ai/)) will accept PDF documents.\n", "\n", + ":::note\n", + "OpenAI requires file-names be specified for PDF inputs. When using LangChain's format, include the `filename` key. See [example below](#example-openai-file-names).\n", + ":::\n", + "\n", "### Documents from base64 data\n", "\n", "To pass documents in-line, format them as content blocks of the following form:\n", diff --git a/docs/docs/integrations/chat/openai.ipynb b/docs/docs/integrations/chat/openai.ipynb index e400cb47fda..aa905bcca1b 100644 --- a/docs/docs/integrations/chat/openai.ipynb +++ b/docs/docs/integrations/chat/openai.ipynb @@ -1463,74 +1463,133 @@ "id": "5d5d9793", "metadata": {}, "source": [ - "## Multimodal Inputs\n", + "## Multimodal Inputs (images, PDFs, audio)\n", "\n", - "OpenAI has models that support multimodal inputs. You can pass in images or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/docs/how_to/multimodal_inputs) docs.\n", + "OpenAI has models that support multimodal inputs. You can pass in images, PDFs, or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/docs/how_to/multimodal_inputs) docs.\n", "\n", "You can see the list of models that support different modalities in [OpenAI's documentation](https://platform.openai.com/docs/models).\n", "\n", - "At the time of this doc's writing, the main OpenAI models you would use would be:\n", + "For all modalities, LangChain supports both its [cross-provider standard](/docs/concepts/multimodality/#multimodality-in-chat-models) as well as OpenAI's native content-block format.\n", "\n", - "- Image inputs: `gpt-4o`, `gpt-4o-mini`\n", - "- Audio inputs: `gpt-4o-audio-preview`\n", + "To pass multimodal data into `ChatOpenAI`, create a [content block](/docs/concepts/messages/) containing the data and incorporate it into a message, e.g., as below:\n", + "```python\n", + "message = {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " # Update prompt as desired\n", + " \"text\": \"Describe the (image / PDF / audio...)\",\n", + " },\n", + " # highlight-next-line\n", + " content_block,\n", + " ],\n", + "}\n", + "```\n", + "See below for examples of content blocks.\n", "\n", - "For an example of passing in image inputs, see the [multimodal inputs how-to guide](/docs/how_to/multimodal_inputs).\n", + "
\n", + "Images\n", "\n", - "Below is an example of passing audio inputs to `gpt-4o-audio-preview`:" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "39d08780", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "\"I'm sorry, but I can't create audio content that involves yelling. Is there anything else I can help you with?\"" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import base64\n", + "Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#images).\n", "\n", - "from langchain_openai import ChatOpenAI\n", + "URLs:\n", + "```python\n", + "# LangChain format\n", + "content_block = {\n", + " \"type\": \"image\",\n", + " \"source_type\": \"url\",\n", + " \"url\": url_string,\n", + "}\n", "\n", - "llm = ChatOpenAI(\n", - " model=\"gpt-4o-audio-preview\",\n", - " temperature=0,\n", - ")\n", + "# OpenAI Chat Completions format\n", + "content_block = {\n", + " \"type\": \"image_url\",\n", + " \"image_url\": {\"url\": url_string},\n", + "}\n", + "```\n", "\n", - "with open(\n", - " \"../../../../libs/partners/openai/tests/integration_tests/chat_models/audio_input.wav\",\n", - " \"rb\",\n", - ") as f:\n", - " # b64 encode it\n", - " audio = f.read()\n", - " audio_b64 = base64.b64encode(audio).decode()\n", + "In-line base64 data:\n", + "```python\n", + "# LangChain format\n", + "content_block = {\n", + " \"type\": \"image\",\n", + " \"source_type\": \"base64\",\n", + " \"data\": base64_string,\n", + " \"mime_type\": \"image/jpeg\",\n", + "}\n", + "\n", + "# OpenAI Chat Completions format\n", + "content_block = {\n", + " \"type\": \"image_url\",\n", + " \"image_url\": {\n", + " \"url\": f\"data:image/jpeg;base64,{base64_string}\",\n", + " },\n", + "}\n", + "```\n", + "\n", + "
\n", "\n", "\n", - "output_message = llm.invoke(\n", - " [\n", - " (\n", - " \"human\",\n", - " [\n", - " {\"type\": \"text\", \"text\": \"Transcribe the following:\"},\n", - " # the audio clip says \"I'm sorry, but I can't create...\"\n", - " {\n", - " \"type\": \"input_audio\",\n", - " \"input_audio\": {\"data\": audio_b64, \"format\": \"wav\"},\n", - " },\n", - " ],\n", - " ),\n", - " ]\n", - ")\n", - "output_message.content" + "
\n", + "PDFs\n", + "\n", + "Note: OpenAI requires file-names be specified for PDF inputs. When using LangChain's format, include the `filename` key.\n", + "\n", + "Read more [here](/docs/how_to/multimodal_inputs/#example-openai-file-names).\n", + "\n", + "Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#documents-pdf).\n", + "\n", + "In-line base64 data:\n", + "```python\n", + "# LangChain format\n", + "content_block = {\n", + " \"type\": \"file\",\n", + " \"source_type\": \"base64\",\n", + " \"data\": base64_string,\n", + " \"mime_type\": \"application/pdf\",\n", + " # highlight-next-line\n", + " \"filename\": \"my-file.pdf\",\n", + "}\n", + "\n", + "# OpenAI Chat Completions format\n", + "content_block = {\n", + " \"type\": \"file\",\n", + " \"file\": {\n", + " \"filename\": \"my-file.pdf\",\n", + " \"file_data\": f\"data:application/pdf;base64,{base64_string}\",\n", + " }\n", + "}\n", + "```\n", + "\n", + "
\n", + "\n", + "\n", + "
\n", + "Audio\n", + "\n", + "See [supported models](https://platform.openai.com/docs/models), e.g., `\"gpt-4o-audio-preview\"`.\n", + "\n", + "Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#audio).\n", + "\n", + "In-line base64 data:\n", + "```python\n", + "# LangChain format\n", + "content_block = {\n", + " \"type\": \"audio\",\n", + " \"source_type\": \"base64\",\n", + " \"mime_type\": \"audio/wav\", # or appropriate mime-type\n", + " \"data\": base64_string,\n", + "}\n", + "\n", + "# OpenAI Chat Completions format\n", + "content_block = {\n", + " \"type\": \"input_audio\",\n", + " \"input_audio\": {\"data\": base64_string, \"format\": \"wav\"},\n", + "}\n", + "```\n", + "\n", + "
" ] }, { @@ -1751,7 +1810,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.4" + "version": "3.10.4" } }, "nbformat": 4,