Added whisper compatibility (#1421)

TODO: - fix compatibility with base Co-authored-by: Klein Tahiraj <62718109+klein-t@users.noreply.github.com> --------- Co-authored-by: Klein Tahiraj <62718109+klein-t@users.noreply.github.com> Co-authored-by: Klein Tahiraj <klein.tahiraj@gmail.com>
2026-03-18 11:07:36 +00:00 · 2023-03-10 01:42:29 +01:00
parent d090e3605f
commit e381cdbd3f
13 changed files with 854 additions and 315 deletions
--- a/docs/modules/audio_models/examples/bananadev.ipynb
+++ b/docs/modules/audio_models/examples/bananadev.ipynb
@@ -0,0 +1,135 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bananadev\n",
+    "\n",
+    "This notebook covers how to use whisper from banana.dev, build an audio chain and deploy it in combination with llm to summarize a transcript gathered from an audio file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import langchain\n",
+    "from langchain.audio_models import AudioBanana\n",
+    "from langchain.cache import SQLiteCache\n",
+    "from langchain.chains import AudioChain, LLMChain, SimpleSequentialChain\n",
+    "from langchain.llms import OpenAI\n",
+    "from langchain.prompts import PromptTemplate"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setting the cache"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_cache = SQLiteCache(database_path=\".langchain.db\")\n",
+    "\n",
+    "langchain.llm_cache = audio_cache"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Defining the audio model\n",
+    "\n",
+    "An `AudioBanana` object has the following arguments:\n",
+    "\n",
+    "* `model_key` (str): model endpoint to use;\n",
+    "* `banana_api_key`(optional[str]): banana api key:\n",
+    "* `max_chars` (optional[int]): max number of chars to return.\n",
+    "\n",
+    "An `AudioChain` object has the following arguments:\n",
+    "* `audio_model` (AudioBase): the audio model to use;\n",
+    "* `output_key` (str): the task to be performed by the model. Not all the audio models support all the tasks. In the case of AudioBanana only `\"transcribe\"` is a valid output key. The output key will returned along with the text data gathered from the audio file.\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_model = AudioBanana(model_key=\"[YOUR MODEL KEY]\", max_chars=20000)\n",
+    "\n",
+    "audio_chain = AudioChain(audio_model=audio_model, output_key=\"transcribe\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setting up a llm and an LLMChain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = OpenAI(temperature=0.7)\n",
+    "template = \"\"\"Speech: {text}\n",
+    "\n",
+    "    Write a short 3 sentence summary of the speech.\n",
+    "\n",
+    "    Summary:\"\"\"\n",
+    "\n",
+    "# note how the input variabòes\n",
+    "prompt_template = PromptTemplate(input_variables=[\"text\"], template=template)\n",
+    "\n",
+    "summary_chain = LLMChain(llm=llm, prompt=prompt_template)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Defining a SimpleSequentialChain and running"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "speech_summary_chain = SimpleSequentialChain(\n",
+    "    chains=[audio_chain, summary_chain], verbose=True\n",
+    ")\n",
+    "\n",
+    "audio_path = \"example_data/ihaveadream.mp3\"\n",
+    "speech_summary_chain.run(audio_path)"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/modules/audio_models/examples/whisper.ipynb
+++ b/docs/modules/audio_models/examples/whisper.ipynb
@@ -0,0 +1,248 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Whisper\n",
+    "\n",
+    "This notebook covers how to use whisper from openAI, build an audio chain and deploy it in combination with llm to summarize a transcript gathered from an audio file. A translation task is also shown."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.audio_models import Whisper\n",
+    "from langchain.chains import AudioChain, LLMChain, SimpleSequentialChain\n",
+    "from langchain.llms import OpenAI\n",
+    "from langchain.prompts import PromptTemplate"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Transcript task"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Defining the audio model\n",
+    "\n",
+    "A `Whisper` objects has the following arguments:\n",
+    "\n",
+    "* `model_key` (str): model endpoint to use;\n",
+    "* `prompt` (optional[str]): openAi's whisper allows prompt to make the model context-aware and better transcribe/translate words;\n",
+    "* `language` (optional[str]): to add only in the case of transcript as a task;\n",
+    "* `max_chars` (optional[int]): max number of chars to return;\n",
+    "* `model` (optional[str]): name of the model to use, set to `whisper-1` by default;\n",
+    "* `temperature` (Optional[float]): temperature;\n",
+    "* `response format` (optional[str]): response format, set by default to `json`\n",
+    "\n",
+    "An `AudioChain` object has the following arguments:\n",
+    "* `audio_model` (AudioBase): the audio model to use;\n",
+    "* `output_key` (str): the task to be performed by the model. Not all the audio models support all the tasks. In the case of Whisper, the possible tasks are `\"transcribe\"` and `\"translate\"`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_model = Whisper(model_key=\"YOUR_OPENAI_KEY\", max_chars=20000)\n",
+    "audio_chain = AudioChain(audio_model=audio_model, output_key=\"transcribe\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setting up llm and an LLMChain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = OpenAI(\n",
+    "    openai_api_key=\"YOUR_OPENAI_KEY\",\n",
+    "    temperature=0.7,\n",
+    ")\n",
+    "\n",
+    "template = \"\"\"Speech: {text}\n",
+    "Write a short 3 sentence summary of the speech.\n",
+    "Summary:\"\"\"\n",
+    "\n",
+    "prompt_template = PromptTemplate(input_variables=[\"text\"], template=template)\n",
+    "\n",
+    "summary_chain = LLMChain(llm=llm, prompt=prompt_template)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Defining a SimpleSequentialChain and running"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "speech_to_summary_chain = SimpleSequentialChain(\n",
+    "    chains=[audio_chain, summary_chain], verbose=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
+      "\u001b[36;1m\u001b[1;3mDo it! Just do it! Don't let your dreams be dreams. Yesterday, you said tomorrow. So just do it! Make your dreams come true! Just do it! Some people dream of success while you're going to wake up and work hard at it. Nothing is impossible! You should get to the point where anyone else would quit, and you're not going to stop there! No! What are you waiting for? Do it! Just do it! Yes you can! Just do it! If you're tired of starting over, stop giving up!\u001b[0m\n",
+      "\u001b[33;1m\u001b[1;3m The speaker encourages the audience to take action and pursue their dreams. They emphasize that nothing is impossible and that the only way to make their dreams come true is to work hard and not give up. They urge the audience to act now and not wait any longer.\u001b[0m\n",
+      "\n",
+      "\u001b[1m> Finished chain.\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "' The speaker encourages the audience to take action and pursue their dreams. They emphasize that nothing is impossible and that the only way to make their dreams come true is to work hard and not give up. They urge the audience to act now and not wait any longer.'"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "audio_path = \"example_data/doit.mp3\"\n",
+    "speech_to_summary_chain.run(audio_path)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Translate\n",
+    "To perform a translation, we just make few changes to the above workflow."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### New AudioChain with output key set to \"translation\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_chain = AudioChain(audio_model=audio_model, output_key=\"translate\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### New SimpleSequentialChain with the new AudioChain and run"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "translation_to_summary_chain = SimpleSequentialChain(\n",
+    "    chains=[audio_chain, summary_chain], verbose=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
+      "\u001b[36;1m\u001b[1;3mI wanted a chicken! Ah, this is the chicken! I like this, I hadn't seen it before! A CHICKEN! I'm afraid of getting my hands dirty and not playing the guitar anymore, otherwise I would break it. Look, a chicken, really, beautiful!\u001b[0m\n",
+      "\u001b[33;1m\u001b[1;3m The speaker is excited to have found a chicken, noting that they have not seen it before. They express fear at getting their hands dirty, as it could prevent them from playing the guitar. Finally, they express admiration for the chicken's beauty.\u001b[0m\n",
+      "\n",
+      "\u001b[1m> Finished chain.\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\" The speaker is excited to have found a chicken, noting that they have not seen it before. They express fear at getting their hands dirty, as it could prevent them from playing the guitar. Finally, they express admiration for the chicken's beauty.\""
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "audio_path = \"example_data/pollo.mp3\"\n",
+    "translation_to_summary_chain.run(audio_path)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.1"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/modules/document_loaders/examples/audio_files.ipynb
+++ b/docs/modules/document_loaders/examples/audio_files.ipynb
@@ -0,0 +1,98 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Audio Files\n",
+    "\n",
+    "This notebook shows how to add data from audio files into a format suitable by LangChain."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders import AudioLoader\n",
+    "from langchain.audio_models import Whisper"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "An `AudioLoader` object has as parameters:\n",
+    "\n",
+    "* `audio_model` (AudioBase): the audio model to use;\n",
+    "* `file_path` (str): the file path;\n",
+    "* `task` (str): the task to perform. Depending on the model, this can be `transcribe` or `translate`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_model = Whisper(model_key=\"[YOUR_OPENAI_API_KEY]\", max_chars=20000)\n",
+    "\n",
+    "file_path = r\"example_data\\pasolini.mp3\"\n",
+    "loader = AudioLoader(audio_model=audio_model, file_path=file_path, task=\"translate\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`AudioLoader.load()` loads the data into a `Document` object."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(page_content='I think that fascism, the fascist regime, was not built on the conclusion that a group of criminals has power. And this group of criminals has power, but it has not been able to do anything. It has not been able to even remotely change the reality of Italy. Now, on the contrary, the regime is a democratic regime, etc. But that acculturation, that homologation that fascism has not been able to obtain, the power of today, that is, the power of consumerism, instead, it is able to obtain it perfectly. I can tell you without a doubt that true fascism is precisely this power of consumerism that is destroying Italy. And this thing happened so quickly that we did not realize it. It has happened in the last 5, 6, 7, 10 years. It was a kind of nightmare in which we saw Italy around us destroy, disappear, disappear. Now, perhaps awakening from this nightmare and looking around, we realize that there is nothing more to do.', lookup_str='', metadata={'source': 'docs\\\\modules\\\\document_loaders\\\\examples\\\\example_data\\\\pasolini.mp3', 'task': 'translate'}, lookup_index=0)]"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.1"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}