mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-19 11:08:55 +00:00
docs: openai audio docs (#27459)
This commit is contained in:
parent
2cf2cefe39
commit
82242dfbb1
@ -434,6 +434,160 @@
|
||||
"fine_tuned_model.invoke(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5d5d9793",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Multimodal Inputs\n",
|
||||
"\n",
|
||||
"OpenAI has models that support multimodal inputs. You can pass in images or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/docs/how_to/multimodal_inputs) docs.\n",
|
||||
"\n",
|
||||
"You can see the list of models that support different modalities in [OpenAI's documentation](https://platform.openai.com/docs/models).\n",
|
||||
"\n",
|
||||
"At the time of this doc's writing, the main OpenAI models you would use would be:\n",
|
||||
"\n",
|
||||
"- Image inputs: `gpt-4o`, `gpt-4o-mini`\n",
|
||||
"- Audio inputs: `gpt-4o-audio-preview`\n",
|
||||
"\n",
|
||||
"For an example of passing in image inputs, see the [multimodal inputs how-to guide](/docs/how_to/multimodal_inputs).\n",
|
||||
"\n",
|
||||
"Below is an example of passing audio inputs to `gpt-4o-audio-preview`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "39d08780",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"I'm sorry, but I can't create audio content that involves yelling. Is there anything else I can help you with?\""
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import base64\n",
|
||||
"\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(\n",
|
||||
" model=\"gpt-4o-audio-preview\",\n",
|
||||
" temperature=0,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"with open(\n",
|
||||
" \"../../../../libs/partners/openai/tests/integration_tests/chat_models/audio_input.wav\",\n",
|
||||
" \"rb\",\n",
|
||||
") as f:\n",
|
||||
" # b64 encode it\n",
|
||||
" audio = f.read()\n",
|
||||
" audio_b64 = base64.b64encode(audio).decode()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"output_message = llm.invoke(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"human\",\n",
|
||||
" [\n",
|
||||
" {\"type\": \"text\", \"text\": \"Transcribe the following:\"},\n",
|
||||
" # the audio clip says \"I'm sorry, but I can't create...\"\n",
|
||||
" {\n",
|
||||
" \"type\": \"input_audio\",\n",
|
||||
" \"input_audio\": {\"data\": audio_b64, \"format\": \"wav\"},\n",
|
||||
" },\n",
|
||||
" ],\n",
|
||||
" ),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"output_message.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "feb4a499",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Audio Generation (Preview)\n",
|
||||
"\n",
|
||||
":::info\n",
|
||||
"Requires `langchain-openai>=0.2.3`\n",
|
||||
":::\n",
|
||||
"\n",
|
||||
"OpenAI has a new [audio generation feature](https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-out) that allows you to use audio inputs and outputs with the `gpt-4o-audio-preview` model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "f67a2cac",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(\n",
|
||||
" model=\"gpt-4o-audio-preview\",\n",
|
||||
" temperature=0,\n",
|
||||
" model_kwargs={\n",
|
||||
" \"modalities\": [\"text\", \"audio\"],\n",
|
||||
" \"audio\": {\"voice\": \"alloy\", \"format\": \"wav\"},\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"output_message = llm.invoke(\n",
|
||||
" [\n",
|
||||
" (\"human\", \"Are you made by OpenAI? Just answer yes or no\"),\n",
|
||||
" ]\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b7dd4e8b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`output_message.additional_kwargs['audio']` will contain a dictionary like\n",
|
||||
"```python\n",
|
||||
"{\n",
|
||||
" 'data': '<audio data b64-encoded',\n",
|
||||
" 'expires_at': 1729268602,\n",
|
||||
" 'id': 'audio_67127d6a44348190af62c1530ef0955a',\n",
|
||||
" 'transcript': 'Yes.'\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"and the format will be what was passed in `model_kwargs['audio']['format']`.\n",
|
||||
"\n",
|
||||
"We can also pass this message with audio data back to the model as part of a message history before openai `expires_at` is reached.\n",
|
||||
"\n",
|
||||
":::note\n",
|
||||
"Output audio is stored under the `audio` key in `AIMessage.additional_kwargs`, but input content blocks are typed with an `input_audio` type and key in `HumanMessage.content` lists. \n",
|
||||
"\n",
|
||||
"For more information, see OpenAI's [audio docs](https://platform.openai.com/docs/guides/audio).\n",
|
||||
":::"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "f5ae473d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"history = [\n",
|
||||
" (\"human\", \"Are you made by OpenAI? Just answer yes or no\"),\n",
|
||||
" output_message,\n",
|
||||
" (\"human\", \"And what is your name? Just give your name.\"),\n",
|
||||
"]\n",
|
||||
"second_output_message = llm.invoke(history)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a796d728-971b-408b-88d5-440015bbb941",
|
||||
@ -447,7 +601,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
|
Loading…
Reference in New Issue
Block a user