diff --git a/docs/docs/how_to/multimodal_inputs.ipynb b/docs/docs/how_to/multimodal_inputs.ipynb
index 6662faf9fba..96da487ed35 100644
--- a/docs/docs/how_to/multimodal_inputs.ipynb
+++ b/docs/docs/how_to/multimodal_inputs.ipynb
@@ -212,6 +212,10 @@
"[Anthropic](/docs/integrations/chat/anthropic/), and\n",
"[Google Gemini](/docs/integrations/chat/google_generative_ai/)) will accept PDF documents.\n",
"\n",
+ ":::note\n",
+ "OpenAI requires file-names be specified for PDF inputs. When using LangChain's format, include the `filename` key. See [example below](#example-openai-file-names).\n",
+ ":::\n",
+ "\n",
"### Documents from base64 data\n",
"\n",
"To pass documents in-line, format them as content blocks of the following form:\n",
diff --git a/docs/docs/integrations/chat/openai.ipynb b/docs/docs/integrations/chat/openai.ipynb
index e400cb47fda..aa905bcca1b 100644
--- a/docs/docs/integrations/chat/openai.ipynb
+++ b/docs/docs/integrations/chat/openai.ipynb
@@ -1463,74 +1463,133 @@
"id": "5d5d9793",
"metadata": {},
"source": [
- "## Multimodal Inputs\n",
+ "## Multimodal Inputs (images, PDFs, audio)\n",
"\n",
- "OpenAI has models that support multimodal inputs. You can pass in images or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/docs/how_to/multimodal_inputs) docs.\n",
+ "OpenAI has models that support multimodal inputs. You can pass in images, PDFs, or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/docs/how_to/multimodal_inputs) docs.\n",
"\n",
"You can see the list of models that support different modalities in [OpenAI's documentation](https://platform.openai.com/docs/models).\n",
"\n",
- "At the time of this doc's writing, the main OpenAI models you would use would be:\n",
+ "For all modalities, LangChain supports both its [cross-provider standard](/docs/concepts/multimodality/#multimodality-in-chat-models) as well as OpenAI's native content-block format.\n",
"\n",
- "- Image inputs: `gpt-4o`, `gpt-4o-mini`\n",
- "- Audio inputs: `gpt-4o-audio-preview`\n",
+ "To pass multimodal data into `ChatOpenAI`, create a [content block](/docs/concepts/messages/) containing the data and incorporate it into a message, e.g., as below:\n",
+ "```python\n",
+ "message = {\n",
+ " \"role\": \"user\",\n",
+ " \"content\": [\n",
+ " {\n",
+ " \"type\": \"text\",\n",
+ " # Update prompt as desired\n",
+ " \"text\": \"Describe the (image / PDF / audio...)\",\n",
+ " },\n",
+ " # highlight-next-line\n",
+ " content_block,\n",
+ " ],\n",
+ "}\n",
+ "```\n",
+ "See below for examples of content blocks.\n",
"\n",
- "For an example of passing in image inputs, see the [multimodal inputs how-to guide](/docs/how_to/multimodal_inputs).\n",
+ "\n",
+ "Images
\n",
"\n",
- "Below is an example of passing audio inputs to `gpt-4o-audio-preview`:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "id": "39d08780",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "\"I'm sorry, but I can't create audio content that involves yelling. Is there anything else I can help you with?\""
- ]
- },
- "execution_count": 8,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "import base64\n",
+ "Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#images).\n",
"\n",
- "from langchain_openai import ChatOpenAI\n",
+ "URLs:\n",
+ "```python\n",
+ "# LangChain format\n",
+ "content_block = {\n",
+ " \"type\": \"image\",\n",
+ " \"source_type\": \"url\",\n",
+ " \"url\": url_string,\n",
+ "}\n",
"\n",
- "llm = ChatOpenAI(\n",
- " model=\"gpt-4o-audio-preview\",\n",
- " temperature=0,\n",
- ")\n",
+ "# OpenAI Chat Completions format\n",
+ "content_block = {\n",
+ " \"type\": \"image_url\",\n",
+ " \"image_url\": {\"url\": url_string},\n",
+ "}\n",
+ "```\n",
"\n",
- "with open(\n",
- " \"../../../../libs/partners/openai/tests/integration_tests/chat_models/audio_input.wav\",\n",
- " \"rb\",\n",
- ") as f:\n",
- " # b64 encode it\n",
- " audio = f.read()\n",
- " audio_b64 = base64.b64encode(audio).decode()\n",
+ "In-line base64 data:\n",
+ "```python\n",
+ "# LangChain format\n",
+ "content_block = {\n",
+ " \"type\": \"image\",\n",
+ " \"source_type\": \"base64\",\n",
+ " \"data\": base64_string,\n",
+ " \"mime_type\": \"image/jpeg\",\n",
+ "}\n",
+ "\n",
+ "# OpenAI Chat Completions format\n",
+ "content_block = {\n",
+ " \"type\": \"image_url\",\n",
+ " \"image_url\": {\n",
+ " \"url\": f\"data:image/jpeg;base64,{base64_string}\",\n",
+ " },\n",
+ "}\n",
+ "```\n",
+ "\n",
+ " \n",
"\n",
"\n",
- "output_message = llm.invoke(\n",
- " [\n",
- " (\n",
- " \"human\",\n",
- " [\n",
- " {\"type\": \"text\", \"text\": \"Transcribe the following:\"},\n",
- " # the audio clip says \"I'm sorry, but I can't create...\"\n",
- " {\n",
- " \"type\": \"input_audio\",\n",
- " \"input_audio\": {\"data\": audio_b64, \"format\": \"wav\"},\n",
- " },\n",
- " ],\n",
- " ),\n",
- " ]\n",
- ")\n",
- "output_message.content"
+ "\n",
+ "PDFs
\n",
+ "\n",
+ "Note: OpenAI requires file-names be specified for PDF inputs. When using LangChain's format, include the `filename` key.\n",
+ "\n",
+ "Read more [here](/docs/how_to/multimodal_inputs/#example-openai-file-names).\n",
+ "\n",
+ "Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#documents-pdf).\n",
+ "\n",
+ "In-line base64 data:\n",
+ "```python\n",
+ "# LangChain format\n",
+ "content_block = {\n",
+ " \"type\": \"file\",\n",
+ " \"source_type\": \"base64\",\n",
+ " \"data\": base64_string,\n",
+ " \"mime_type\": \"application/pdf\",\n",
+ " # highlight-next-line\n",
+ " \"filename\": \"my-file.pdf\",\n",
+ "}\n",
+ "\n",
+ "# OpenAI Chat Completions format\n",
+ "content_block = {\n",
+ " \"type\": \"file\",\n",
+ " \"file\": {\n",
+ " \"filename\": \"my-file.pdf\",\n",
+ " \"file_data\": f\"data:application/pdf;base64,{base64_string}\",\n",
+ " }\n",
+ "}\n",
+ "```\n",
+ "\n",
+ " \n",
+ "\n",
+ "\n",
+ "\n",
+ "Audio
\n",
+ "\n",
+ "See [supported models](https://platform.openai.com/docs/models), e.g., `\"gpt-4o-audio-preview\"`.\n",
+ "\n",
+ "Refer to examples in the how-to guide [here](/docs/how_to/multimodal_inputs/#audio).\n",
+ "\n",
+ "In-line base64 data:\n",
+ "```python\n",
+ "# LangChain format\n",
+ "content_block = {\n",
+ " \"type\": \"audio\",\n",
+ " \"source_type\": \"base64\",\n",
+ " \"mime_type\": \"audio/wav\", # or appropriate mime-type\n",
+ " \"data\": base64_string,\n",
+ "}\n",
+ "\n",
+ "# OpenAI Chat Completions format\n",
+ "content_block = {\n",
+ " \"type\": \"input_audio\",\n",
+ " \"input_audio\": {\"data\": base64_string, \"format\": \"wav\"},\n",
+ "}\n",
+ "```\n",
+ "\n",
+ " "
]
},
{
@@ -1751,7 +1810,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.11.4"
+ "version": "3.10.4"
}
},
"nbformat": 4,