langchain/docs/modules/audio_models/examples/whisper.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Whisper\n",
    "\n",
    "This notebook covers how to use whisper from openAI, build an audio chain and deploy it in combination with llm to summarize a transcript gathered from an audio file. A translation task is also shown."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.audio_models import Whisper\n",
    "from langchain.chains import AudioChain, LLMChain, SimpleSequentialChain\n",
    "from langchain.llms import OpenAI\n",
    "from langchain.prompts import PromptTemplate"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Transcript task"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Defining the audio model\n",
    "\n",
    "A `Whisper` objects has the following arguments:\n",
    "\n",
    "* `model_key` (str): model endpoint to use;\n",
    "* `prompt` (optional[str]): openAi's whisper allows prompt to make the model context-aware and better transcribe/translate words;\n",
    "* `language` (optional[str]): to add only in the case of transcript as a task;\n",
    "* `max_chars` (optional[int]): max number of chars to return;\n",
    "* `model` (optional[str]): name of the model to use, set to `whisper-1` by default;\n",
    "* `temperature` (Optional[float]): temperature;\n",
    "* `response format` (optional[str]): response format, set by default to `json`\n",
    "\n",
    "An `AudioChain` object has the following arguments:\n",
    "* `audio_model` (AudioBase): the audio model to use;\n",
    "* `output_key` (str): the task to be performed by the model. Not all the audio models support all the tasks. In the case of Whisper, the possible tasks are `\"transcribe\"` and `\"translate\"`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "audio_model = Whisper(model_key=\"YOUR_OPENAI_KEY\", max_chars=20000)\n",
    "audio_chain = AudioChain(audio_model=audio_model, output_key=\"transcribe\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setting up llm and an LLMChain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI(\n",
    "    openai_api_key=\"YOUR_OPENAI_KEY\",\n",
    "    temperature=0.7,\n",
    ")\n",
    "\n",
    "template = \"\"\"Speech: {text}\n",
    "Write a short 3 sentence summary of the speech.\n",
    "Summary:\"\"\"\n",
    "\n",
    "prompt_template = PromptTemplate(input_variables=[\"text\"], template=template)\n",
    "\n",
    "summary_chain = LLMChain(llm=llm, prompt=prompt_template)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Defining a SimpleSequentialChain and running"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "speech_to_summary_chain = SimpleSequentialChain(\n",
    "    chains=[audio_chain, summary_chain], verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
      "\u001b[36;1m\u001b[1;3mDo it! Just do it! Don't let your dreams be dreams. Yesterday, you said tomorrow. So just do it! Make your dreams come true! Just do it! Some people dream of success while you're going to wake up and work hard at it. Nothing is impossible! You should get to the point where anyone else would quit, and you're not going to stop there! No! What are you waiting for? Do it! Just do it! Yes you can! Just do it! If you're tired of starting over, stop giving up!\u001b[0m\n",
      "\u001b[33;1m\u001b[1;3m The speaker encourages the audience to take action and pursue their dreams. They emphasize that nothing is impossible and that the only way to make their dreams come true is to work hard and not give up. They urge the audience to act now and not wait any longer.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "' The speaker encourages the audience to take action and pursue their dreams. They emphasize that nothing is impossible and that the only way to make their dreams come true is to work hard and not give up. They urge the audience to act now and not wait any longer.'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "audio_path = \"example_data/doit.mp3\"\n",
    "speech_to_summary_chain.run(audio_path)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Translate\n",
    "To perform a translation, we just make few changes to the above workflow."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### New AudioChain with output key set to \"translation\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "audio_chain = AudioChain(audio_model=audio_model, output_key=\"translate\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### New SimpleSequentialChain with the new AudioChain and run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "translation_to_summary_chain = SimpleSequentialChain(\n",
    "    chains=[audio_chain, summary_chain], verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
      "\u001b[36;1m\u001b[1;3mI wanted a chicken! Ah, this is the chicken! I like this, I hadn't seen it before! A CHICKEN! I'm afraid of getting my hands dirty and not playing the guitar anymore, otherwise I would break it. Look, a chicken, really, beautiful!\u001b[0m\n",
      "\u001b[33;1m\u001b[1;3m The speaker is excited to have found a chicken, noting that they have not seen it before. They express fear at getting their hands dirty, as it could prevent them from playing the guitar. Finally, they express admiration for the chicken's beauty.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "\" The speaker is excited to have found a chicken, noting that they have not seen it before. They express fear at getting their hands dirty, as it could prevent them from playing the guitar. Finally, they express admiration for the chicken's beauty.\""
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "audio_path = \"example_data/pollo.mp3\"\n",
    "translation_to_summary_chain.run(audio_path)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.1"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}