Added whisper compatibility (#1421 )

TODO: - fix compatibility with base Co-authored-by: Klein Tahiraj <62718109+klein-t@users.noreply.github.com> --------- Co-authored-by: Klein Tahiraj <62718109+klein-t@users.noreply.github.com> Co-authored-by: Klein Tahiraj <klein.tahiraj@gmail.com>
Added AudioChain + AudioBanana (#1313 )
2026-01-23 21:31:02 +00:00 · 2023-03-09 16:42:29 -08:00 · 2023-03-03 09:54:17 -08:00
11 changed files with 854 additions and 141 deletions
--- a/docs/modules/audio_models/examples/bananadev.ipynb
+++ b/docs/modules/audio_models/examples/bananadev.ipynb
@@ -0,0 +1,135 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bananadev\n",
+    "\n",
+    "This notebook covers how to use whisper from banana.dev, build an audio chain and deploy it in combination with llm to summarize a transcript gathered from an audio file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import langchain\n",
+    "from langchain.audio_models import AudioBanana\n",
+    "from langchain.cache import SQLiteCache\n",
+    "from langchain.chains import AudioChain, LLMChain, SimpleSequentialChain\n",
+    "from langchain.llms import OpenAI\n",
+    "from langchain.prompts import PromptTemplate"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setting the cache"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_cache = SQLiteCache(database_path=\".langchain.db\")\n",
+    "\n",
+    "langchain.llm_cache = audio_cache"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Defining the audio model\n",
+    "\n",
+    "An `AudioBanana` object has the following arguments:\n",
+    "\n",
+    "* `model_key` (str): model endpoint to use;\n",
+    "* `banana_api_key`(optional[str]): banana api key:\n",
+    "* `max_chars` (optional[int]): max number of chars to return.\n",
+    "\n",
+    "An `AudioChain` object has the following arguments:\n",
+    "* `audio_model` (AudioBase): the audio model to use;\n",
+    "* `output_key` (str): the task to be performed by the model. Not all the audio models support all the tasks. In the case of AudioBanana only `\"transcribe\"` is a valid output key. The output key will returned along with the text data gathered from the audio file.\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_model = AudioBanana(model_key=\"[YOUR MODEL KEY]\", max_chars=20000)\n",
+    "\n",
+    "audio_chain = AudioChain(audio_model=audio_model, output_key=\"transcribe\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setting up a llm and an LLMChain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = OpenAI(temperature=0.7)\n",
+    "template = \"\"\"Speech: {text}\n",
+    "\n",
+    "    Write a short 3 sentence summary of the speech.\n",
+    "\n",
+    "    Summary:\"\"\"\n",
+    "\n",
+    "# note how the input variabòes\n",
+    "prompt_template = PromptTemplate(input_variables=[\"text\"], template=template)\n",
+    "\n",
+    "summary_chain = LLMChain(llm=llm, prompt=prompt_template)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Defining a SimpleSequentialChain and running"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "speech_summary_chain = SimpleSequentialChain(\n",
+    "    chains=[audio_chain, summary_chain], verbose=True\n",
+    ")\n",
+    "\n",
+    "audio_path = \"example_data/ihaveadream.mp3\"\n",
+    "speech_summary_chain.run(audio_path)"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/modules/audio_models/examples/whisper.ipynb
+++ b/docs/modules/audio_models/examples/whisper.ipynb
@@ -0,0 +1,248 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Whisper\n",
+    "\n",
+    "This notebook covers how to use whisper from openAI, build an audio chain and deploy it in combination with llm to summarize a transcript gathered from an audio file. A translation task is also shown."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.audio_models import Whisper\n",
+    "from langchain.chains import AudioChain, LLMChain, SimpleSequentialChain\n",
+    "from langchain.llms import OpenAI\n",
+    "from langchain.prompts import PromptTemplate"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Transcript task"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Defining the audio model\n",
+    "\n",
+    "A `Whisper` objects has the following arguments:\n",
+    "\n",
+    "* `model_key` (str): model endpoint to use;\n",
+    "* `prompt` (optional[str]): openAi's whisper allows prompt to make the model context-aware and better transcribe/translate words;\n",
+    "* `language` (optional[str]): to add only in the case of transcript as a task;\n",
+    "* `max_chars` (optional[int]): max number of chars to return;\n",
+    "* `model` (optional[str]): name of the model to use, set to `whisper-1` by default;\n",
+    "* `temperature` (Optional[float]): temperature;\n",
+    "* `response format` (optional[str]): response format, set by default to `json`\n",
+    "\n",
+    "An `AudioChain` object has the following arguments:\n",
+    "* `audio_model` (AudioBase): the audio model to use;\n",
+    "* `output_key` (str): the task to be performed by the model. Not all the audio models support all the tasks. In the case of Whisper, the possible tasks are `\"transcribe\"` and `\"translate\"`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_model = Whisper(model_key=\"YOUR_OPENAI_KEY\", max_chars=20000)\n",
+    "audio_chain = AudioChain(audio_model=audio_model, output_key=\"transcribe\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setting up llm and an LLMChain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = OpenAI(\n",
+    "    openai_api_key=\"YOUR_OPENAI_KEY\",\n",
+    "    temperature=0.7,\n",
+    ")\n",
+    "\n",
+    "template = \"\"\"Speech: {text}\n",
+    "Write a short 3 sentence summary of the speech.\n",
+    "Summary:\"\"\"\n",
+    "\n",
+    "prompt_template = PromptTemplate(input_variables=[\"text\"], template=template)\n",
+    "\n",
+    "summary_chain = LLMChain(llm=llm, prompt=prompt_template)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Defining a SimpleSequentialChain and running"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "speech_to_summary_chain = SimpleSequentialChain(\n",
+    "    chains=[audio_chain, summary_chain], verbose=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
+      "\u001b[36;1m\u001b[1;3mDo it! Just do it! Don't let your dreams be dreams. Yesterday, you said tomorrow. So just do it! Make your dreams come true! Just do it! Some people dream of success while you're going to wake up and work hard at it. Nothing is impossible! You should get to the point where anyone else would quit, and you're not going to stop there! No! What are you waiting for? Do it! Just do it! Yes you can! Just do it! If you're tired of starting over, stop giving up!\u001b[0m\n",
+      "\u001b[33;1m\u001b[1;3m The speaker encourages the audience to take action and pursue their dreams. They emphasize that nothing is impossible and that the only way to make their dreams come true is to work hard and not give up. They urge the audience to act now and not wait any longer.\u001b[0m\n",
+      "\n",
+      "\u001b[1m> Finished chain.\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "' The speaker encourages the audience to take action and pursue their dreams. They emphasize that nothing is impossible and that the only way to make their dreams come true is to work hard and not give up. They urge the audience to act now and not wait any longer.'"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "audio_path = \"example_data/doit.mp3\"\n",
+    "speech_to_summary_chain.run(audio_path)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Translate\n",
+    "To perform a translation, we just make few changes to the above workflow."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### New AudioChain with output key set to \"translation\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_chain = AudioChain(audio_model=audio_model, output_key=\"translate\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### New SimpleSequentialChain with the new AudioChain and run"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "translation_to_summary_chain = SimpleSequentialChain(\n",
+    "    chains=[audio_chain, summary_chain], verbose=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
+      "\u001b[36;1m\u001b[1;3mI wanted a chicken! Ah, this is the chicken! I like this, I hadn't seen it before! A CHICKEN! I'm afraid of getting my hands dirty and not playing the guitar anymore, otherwise I would break it. Look, a chicken, really, beautiful!\u001b[0m\n",
+      "\u001b[33;1m\u001b[1;3m The speaker is excited to have found a chicken, noting that they have not seen it before. They express fear at getting their hands dirty, as it could prevent them from playing the guitar. Finally, they express admiration for the chicken's beauty.\u001b[0m\n",
+      "\n",
+      "\u001b[1m> Finished chain.\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\" The speaker is excited to have found a chicken, noting that they have not seen it before. They express fear at getting their hands dirty, as it could prevent them from playing the guitar. Finally, they express admiration for the chicken's beauty.\""
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "audio_path = \"example_data/pollo.mp3\"\n",
+    "translation_to_summary_chain.run(audio_path)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.1"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/modules/document_loaders/examples/audio_files.ipynb
+++ b/docs/modules/document_loaders/examples/audio_files.ipynb
@@ -0,0 +1,98 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Audio Files\n",
+    "\n",
+    "This notebook shows how to add data from audio files into a format suitable by LangChain."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders import AudioLoader\n",
+    "from langchain.audio_models import Whisper"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "An `AudioLoader` object has as parameters:\n",
+    "\n",
+    "* `audio_model` (AudioBase): the audio model to use;\n",
+    "* `file_path` (str): the file path;\n",
+    "* `task` (str): the task to perform. Depending on the model, this can be `transcribe` or `translate`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_model = Whisper(model_key=\"[YOUR_OPENAI_API_KEY]\", max_chars=20000)\n",
+    "\n",
+    "file_path = r\"example_data\\pasolini.mp3\"\n",
+    "loader = AudioLoader(audio_model=audio_model, file_path=file_path, task=\"translate\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`AudioLoader.load()` loads the data into a `Document` object."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(page_content='I think that fascism, the fascist regime, was not built on the conclusion that a group of criminals has power. And this group of criminals has power, but it has not been able to do anything. It has not been able to even remotely change the reality of Italy. Now, on the contrary, the regime is a democratic regime, etc. But that acculturation, that homologation that fascism has not been able to obtain, the power of today, that is, the power of consumerism, instead, it is able to obtain it perfectly. I can tell you without a doubt that true fascism is precisely this power of consumerism that is destroying Italy. And this thing happened so quickly that we did not realize it. It has happened in the last 5, 6, 7, 10 years. It was a kind of nightmare in which we saw Italy around us destroy, disappear, disappear. Now, perhaps awakening from this nightmare and looking around, we realize that there is nothing more to do.', lookup_str='', metadata={'source': 'docs\\\\modules\\\\document_loaders\\\\examples\\\\example_data\\\\pasolini.mp3', 'task': 'translate'}, lookup_index=0)]"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "loader.load()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.1"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/langchain/audio_models/init.py
+++ b/langchain/audio_models/init.py
@@ -0,0 +1,9 @@
+"""Wrappers on top of text to audio model APIs."""
+
+from langchain.audio_models.bananadev import AudioBanana
+from langchain.audio_models.whisper import Whisper
+
+__all__ = [
+    "AudioBanana",
+    "Whisper",
+]
--- a/langchain/audio_models/bananadev.py
+++ b/langchain/audio_models/bananadev.py
@@ -0,0 +1,70 @@
+from typing import Dict, Optional
+
+from pydantic import Extra, root_validator
+
+from langchain.audio_models.base import AudioBase
+from langchain.utils import get_from_dict_or_env
+
+
+class AudioBanana(AudioBase):
+    """Wrapper around Whisper models on Banana.
+
+    To use, you should have the ``banana-dev`` python package installed,
+    and the environment variable ``BANANA_API_KEY`` set with your API key.
+
+    Any parameters that are valid to be passed to the call can be passed
+    in, even if not explicitly saved on this class.
+    """
+
+    model_key: str = ""
+    """model endpoint to use"""
+
+    banana_api_key: Optional[str] = None
+
+    max_chars: Optional[int] = None
+
+    class Config:
+        """Configuration for this pydantic config."""
+
+        extra = Extra.forbid
+
+    @root_validator()
+    def validate_environment(cls, values: Dict) -> Dict:
+        """Validate that api key and python package exists in environment."""
+        banana_api_key = get_from_dict_or_env(
+            values, "banana_api_key", "BANANA_API_KEY"
+        )
+        values["banana_api_key"] = banana_api_key
+        return values
+
+    def transcript(self, audio_path: str, task: str = "transcribe") -> str:
+        """Call to Banana endpoint."""
+        try:
+            import banana_dev as banana
+        except ImportError:
+            raise ValueError(
+                "Could not import banana-dev python package. "
+                "Please install it with `pip install banana-dev`."
+            )
+
+        if task == "transcribe":
+            mp3 = self._read_mp3_audio(audio_path)
+            model_inputs = {"mp3BytesString": mp3}
+            response = banana.run(self.banana_api_key, self.model_key, model_inputs)
+            try:
+                text = response["modelOutputs"][0]["text"]
+                if not isinstance(text, str):
+                    raise ValueError(f"Expected text to be a string, got {type(text)}")
+
+            except KeyError:
+                raise ValueError(
+                    f"Response should be {'modelOutputs': [{'output': 'text'}]}."
+                    f"Response was: {response}"
+                )
+        else:
+            raise ValueError(
+                f"Only task available is 'transcribe' for this model."
+                f"Task was: {task}"
+            )
+
+        return text[: self.max_chars] if self.max_chars else text
--- a/langchain/audio_models/base.py
+++ b/langchain/audio_models/base.py
@@ -0,0 +1,24 @@
+"""Base class for text to audio models."""
+import base64
+import pathlib
+from abc import ABC, abstractmethod
+from io import BytesIO
+
+from pydantic import BaseModel
+
+
+class AudioBase(BaseModel, ABC):
+    @staticmethod
+    def _read_mp3_audio(audio_path: str) -> str:
+        """Read audio file."""
+        if not pathlib.Path(audio_path).exists():
+            raise ValueError(f"Can't find audio file at {audio_path}")
+        if not audio_path.endswith(".mp3"):
+            raise ValueError("Only mp3 files are supported.")
+        with open(audio_path, "rb") as file:
+            mp3bytes = BytesIO(file.read())
+        return base64.b64encode(mp3bytes.getvalue()).decode("ISO-8859-1")
+
+    @abstractmethod
+    def transcript(self, audio_path: str, task: str) -> str:
+        """Transcribe audio file."""
--- a/langchain/audio_models/whisper.py
+++ b/langchain/audio_models/whisper.py
@@ -0,0 +1,74 @@
+from typing import Optional
+
+from pydantic import Extra
+
+from langchain.audio_models.base import AudioBase
+
+
+class Whisper(AudioBase):
+    """Wrapper around Whisper API.
+    To use, you should have the ``openai`` python package installed,
+    and the environment variable ``OPENAI_API_KEY`` set with your API key.
+    Any parameters that are valid to be passed to the call can be passed
+    in, even if not explicitly saved on this class.
+    """
+
+    model_key: str = ""
+    prompt: Optional[str] = ""
+    language: Optional[str] = ""
+    max_chars: Optional[int] = None
+    model: Optional[str] = "whisper-1"
+    temperature: Optional[float] = 0.0
+    response_format: Optional[str] = "json"
+
+    class Config:
+        """Configuration for this pydantic config."""
+
+        extra = Extra.forbid
+
+    def transcript(self, audio_path: str, task: str = "transcribe") -> str:
+        """Call to Whisper transcribe endpoint."""
+
+        try:
+            import openai
+        except ImportError:
+            raise ValueError(
+                "Could not import openai python package. "
+                "Please install it with `pip install openai`."
+            )
+
+        if task == "transcribe":
+            openai.api_key = self.model_key
+            audio_file = open(audio_path, "rb")
+            response = openai.Audio.transcribe(
+                file=audio_file,
+                model=self.model,
+                prompt=self.prompt,
+                language=self.language,
+                temperature=self.temperature,
+                response_format=self.response_format,
+            )
+
+        if task == "translate":
+            openai.api_key = self.model_key
+            audio_file = open(audio_path, "rb")
+            response = openai.Audio.translate(
+                file=audio_file,
+                model=self.model,
+                prompt=self.prompt,
+                temperature=self.temperature,
+                response_format=self.response_format,
+            )
+
+        try:
+            text = response["text"]
+            if not isinstance(text, str):
+                raise ValueError(f"Expected text to be a string, got {type(text)}")
+
+        except KeyError:
+            raise ValueError(
+                f"Response should be {'modelOutputs': [{'output': 'text'}]}."
+                f"Response was: {response}"
+            )
+
+        return text[: self.max_chars] if self.max_chars else text
--- a/langchain/chains/init.py
+++ b/langchain/chains/init.py
@@ -1,55 +1,57 @@
-"""Chains are easily reusable components which can be linked together."""
-from langchain.chains.api.base import APIChain
-from langchain.chains.chat_vector_db.base import ChatVectorDBChain
-from langchain.chains.combine_documents.base import AnalyzeDocumentChain
-from langchain.chains.constitutional_ai.base import ConstitutionalChain
-from langchain.chains.conversation.base import ConversationChain
-from langchain.chains.graph_qa.base import GraphQAChain
-from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
-from langchain.chains.llm import LLMChain
-from langchain.chains.llm_bash.base import LLMBashChain
-from langchain.chains.llm_checker.base import LLMCheckerChain
-from langchain.chains.llm_math.base import LLMMathChain
-from langchain.chains.llm_requests import LLMRequestsChain
-from langchain.chains.llm_summarization_checker.base import LLMSummarizationCheckerChain
-from langchain.chains.loading import load_chain
-from langchain.chains.mapreduce import MapReduceChain
-from langchain.chains.moderation import OpenAIModerationChain
-from langchain.chains.pal.base import PALChain
-from langchain.chains.qa_with_sources.base import QAWithSourcesChain
-from langchain.chains.qa_with_sources.vector_db import VectorDBQAWithSourcesChain
-from langchain.chains.sequential import SequentialChain, SimpleSequentialChain
-from langchain.chains.sql_database.base import (
-    SQLDatabaseChain,
-    SQLDatabaseSequentialChain,
-)
-from langchain.chains.transform import TransformChain
-from langchain.chains.vector_db_qa.base import VectorDBQA
-
-__all__ = [
-    "ConversationChain",
-    "LLMChain",
-    "LLMBashChain",
-    "LLMCheckerChain",
-    "LLMSummarizationCheckerChain",
-    "LLMMathChain",
-    "PALChain",
-    "QAWithSourcesChain",
-    "SQLDatabaseChain",
-    "SequentialChain",
-    "SimpleSequentialChain",
-    "VectorDBQA",
-    "VectorDBQAWithSourcesChain",
-    "APIChain",
-    "LLMRequestsChain",
-    "TransformChain",
-    "MapReduceChain",
-    "OpenAIModerationChain",
-    "SQLDatabaseSequentialChain",
-    "load_chain",
-    "AnalyzeDocumentChain",
-    "HypotheticalDocumentEmbedder",
-    "ChatVectorDBChain",
-    "GraphQAChain",
-    "ConstitutionalChain",
-]
+"""Chains are easily reusable components which can be linked together."""
+from langchain.chains.api.base import APIChain
+from langchain.chains.audio import AudioChain
+from langchain.chains.chat_vector_db.base import ChatVectorDBChain
+from langchain.chains.combine_documents.base import AnalyzeDocumentChain
+from langchain.chains.constitutional_ai.base import ConstitutionalChain
+from langchain.chains.conversation.base import ConversationChain
+from langchain.chains.graph_qa.base import GraphQAChain
+from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
+from langchain.chains.llm import LLMChain
+from langchain.chains.llm_bash.base import LLMBashChain
+from langchain.chains.llm_checker.base import LLMCheckerChain
+from langchain.chains.llm_math.base import LLMMathChain
+from langchain.chains.llm_requests import LLMRequestsChain
+from langchain.chains.llm_summarization_checker.base import LLMSummarizationCheckerChain
+from langchain.chains.loading import load_chain
+from langchain.chains.mapreduce import MapReduceChain
+from langchain.chains.moderation import OpenAIModerationChain
+from langchain.chains.pal.base import PALChain
+from langchain.chains.qa_with_sources.base import QAWithSourcesChain
+from langchain.chains.qa_with_sources.vector_db import VectorDBQAWithSourcesChain
+from langchain.chains.sequential import SequentialChain, SimpleSequentialChain
+from langchain.chains.sql_database.base import (
+    SQLDatabaseChain,
+    SQLDatabaseSequentialChain,
+)
+from langchain.chains.transform import TransformChain
+from langchain.chains.vector_db_qa.base import VectorDBQA
+
+__all__ = [
+    "ConversationChain",
+    "LLMChain",
+    "AudioChain",
+    "LLMBashChain",
+    "LLMCheckerChain",
+    "LLMSummarizationCheckerChain",
+    "LLMMathChain",
+    "PALChain",
+    "QAWithSourcesChain",
+    "SQLDatabaseChain",
+    "SequentialChain",
+    "SimpleSequentialChain",
+    "VectorDBQA",
+    "VectorDBQAWithSourcesChain",
+    "APIChain",
+    "LLMRequestsChain",
+    "TransformChain",
+    "MapReduceChain",
+    "OpenAIModerationChain",
+    "SQLDatabaseSequentialChain",
+    "load_chain",
+    "AnalyzeDocumentChain",
+    "HypotheticalDocumentEmbedder",
+    "ChatVectorDBChain",
+    "GraphQAChain",
+    "ConstitutionalChain",
+]
--- a/langchain/chains/audio.py
+++ b/langchain/chains/audio.py
@@ -0,0 +1,24 @@
+from typing import Dict, List
+
+from langchain.audio_models.base import AudioBase
+from langchain.chains.base import Chain
+
+
+class AudioChain(Chain):
+    audio_model: AudioBase
+
+    output_key: str = "transcribe"
+
+    @property
+    def input_keys(self) -> List[str]:
+        return ["audio_file"]
+
+    @property
+    def output_keys(self) -> List[str]:
+        return [self.output_key]
+
+    def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
+        file_path = inputs["audio_file"]
+        task = self.output_key
+        content = self.audio_model.transcript(file_path, task).strip()
+        return {self.output_key: content}
--- a/langchain/document_loaders/init.py
+++ b/langchain/document_loaders/init.py
@@ -1,86 +1,88 @@
-"""All different types of document loaders."""
-
-from langchain.document_loaders.airbyte_json import AirbyteJSONLoader
-from langchain.document_loaders.azlyrics import AZLyricsLoader
-from langchain.document_loaders.college_confidential import CollegeConfidentialLoader
-from langchain.document_loaders.conllu import CoNLLULoader
-from langchain.document_loaders.directory import DirectoryLoader
-from langchain.document_loaders.docx import UnstructuredDocxLoader
-from langchain.document_loaders.email import UnstructuredEmailLoader
-from langchain.document_loaders.evernote import EverNoteLoader
-from langchain.document_loaders.facebook_chat import FacebookChatLoader
-from langchain.document_loaders.gcs_directory import GCSDirectoryLoader
-from langchain.document_loaders.gcs_file import GCSFileLoader
-from langchain.document_loaders.gitbook import GitbookLoader
-from langchain.document_loaders.googledrive import GoogleDriveLoader
-from langchain.document_loaders.gutenberg import GutenbergLoader
-from langchain.document_loaders.hn import HNLoader
-from langchain.document_loaders.html import UnstructuredHTMLLoader
-from langchain.document_loaders.ifixit import IFixitLoader
-from langchain.document_loaders.image import UnstructuredImageLoader
-from langchain.document_loaders.imsdb import IMSDbLoader
-from langchain.document_loaders.notebook import NotebookLoader
-from langchain.document_loaders.notion import NotionDirectoryLoader
-from langchain.document_loaders.obsidian import ObsidianLoader
-from langchain.document_loaders.online_pdf import OnlinePDFLoader
-from langchain.document_loaders.paged_pdf import PagedPDFSplitter
-from langchain.document_loaders.pdf import PDFMinerLoader, UnstructuredPDFLoader
-from langchain.document_loaders.powerpoint import UnstructuredPowerPointLoader
-from langchain.document_loaders.readthedocs import ReadTheDocsLoader
-from langchain.document_loaders.roam import RoamLoader
-from langchain.document_loaders.s3_directory import S3DirectoryLoader
-from langchain.document_loaders.s3_file import S3FileLoader
-from langchain.document_loaders.srt import SRTLoader
-from langchain.document_loaders.telegram import TelegramChatLoader
-from langchain.document_loaders.text import TextLoader
-from langchain.document_loaders.unstructured import (
-    UnstructuredFileIOLoader,
-    UnstructuredFileLoader,
-)
-from langchain.document_loaders.url import UnstructuredURLLoader
-from langchain.document_loaders.web_base import WebBaseLoader
-from langchain.document_loaders.word_document import UnstructuredWordDocumentLoader
-from langchain.document_loaders.youtube import YoutubeLoader
-
-__all__ = [
-    "UnstructuredFileLoader",
-    "UnstructuredFileIOLoader",
-    "UnstructuredURLLoader",
-    "DirectoryLoader",
-    "NotionDirectoryLoader",
-    "ReadTheDocsLoader",
-    "GoogleDriveLoader",
-    "UnstructuredHTMLLoader",
-    "UnstructuredPowerPointLoader",
-    "UnstructuredWordDocumentLoader",
-    "UnstructuredPDFLoader",
-    "UnstructuredImageLoader",
-    "ObsidianLoader",
-    "UnstructuredDocxLoader",
-    "UnstructuredEmailLoader",
-    "RoamLoader",
-    "YoutubeLoader",
-    "S3FileLoader",
-    "TextLoader",
-    "HNLoader",
-    "GitbookLoader",
-    "S3DirectoryLoader",
-    "GCSFileLoader",
-    "GCSDirectoryLoader",
-    "WebBaseLoader",
-    "IMSDbLoader",
-    "AZLyricsLoader",
-    "CollegeConfidentialLoader",
-    "IFixitLoader",
-    "GutenbergLoader",
-    "PagedPDFSplitter",
-    "EverNoteLoader",
-    "AirbyteJSONLoader",
-    "OnlinePDFLoader",
-    "PDFMinerLoader",
-    "TelegramChatLoader",
-    "SRTLoader",
-    "FacebookChatLoader",
-    "NotebookLoader",
-    "CoNLLULoader",
-]
+"""All different types of document loaders."""
+
+from langchain.document_loaders.airbyte_json import AirbyteJSONLoader
+from langchain.document_loaders.audio_files import AudioLoader
+from langchain.document_loaders.azlyrics import AZLyricsLoader
+from langchain.document_loaders.college_confidential import CollegeConfidentialLoader
+from langchain.document_loaders.conllu import CoNLLULoader
+from langchain.document_loaders.directory import DirectoryLoader
+from langchain.document_loaders.docx import UnstructuredDocxLoader
+from langchain.document_loaders.email import UnstructuredEmailLoader
+from langchain.document_loaders.evernote import EverNoteLoader
+from langchain.document_loaders.facebook_chat import FacebookChatLoader
+from langchain.document_loaders.gcs_directory import GCSDirectoryLoader
+from langchain.document_loaders.gcs_file import GCSFileLoader
+from langchain.document_loaders.gitbook import GitbookLoader
+from langchain.document_loaders.googledrive import GoogleDriveLoader
+from langchain.document_loaders.gutenberg import GutenbergLoader
+from langchain.document_loaders.hn import HNLoader
+from langchain.document_loaders.html import UnstructuredHTMLLoader
+from langchain.document_loaders.ifixit import IFixitLoader
+from langchain.document_loaders.image import UnstructuredImageLoader
+from langchain.document_loaders.imsdb import IMSDbLoader
+from langchain.document_loaders.notebook import NotebookLoader
+from langchain.document_loaders.notion import NotionDirectoryLoader
+from langchain.document_loaders.obsidian import ObsidianLoader
+from langchain.document_loaders.online_pdf import OnlinePDFLoader
+from langchain.document_loaders.paged_pdf import PagedPDFSplitter
+from langchain.document_loaders.pdf import PDFMinerLoader, UnstructuredPDFLoader
+from langchain.document_loaders.powerpoint import UnstructuredPowerPointLoader
+from langchain.document_loaders.readthedocs import ReadTheDocsLoader
+from langchain.document_loaders.roam import RoamLoader
+from langchain.document_loaders.s3_directory import S3DirectoryLoader
+from langchain.document_loaders.s3_file import S3FileLoader
+from langchain.document_loaders.srt import SRTLoader
+from langchain.document_loaders.telegram import TelegramChatLoader
+from langchain.document_loaders.text import TextLoader
+from langchain.document_loaders.unstructured import (
+    UnstructuredFileIOLoader,
+    UnstructuredFileLoader,
+)
+from langchain.document_loaders.url import UnstructuredURLLoader
+from langchain.document_loaders.web_base import WebBaseLoader
+from langchain.document_loaders.word_document import UnstructuredWordDocumentLoader
+from langchain.document_loaders.youtube import YoutubeLoader
+
+__all__ = [
+    "UnstructuredFileLoader",
+    "UnstructuredFileIOLoader",
+    "UnstructuredURLLoader",
+    "DirectoryLoader",
+    "NotionDirectoryLoader",
+    "ReadTheDocsLoader",
+    "GoogleDriveLoader",
+    "UnstructuredHTMLLoader",
+    "UnstructuredPowerPointLoader",
+    "UnstructuredWordDocumentLoader",
+    "UnstructuredPDFLoader",
+    "UnstructuredImageLoader",
+    "ObsidianLoader",
+    "UnstructuredDocxLoader",
+    "UnstructuredEmailLoader",
+    "RoamLoader",
+    "YoutubeLoader",
+    "S3FileLoader",
+    "TextLoader",
+    "HNLoader",
+    "GitbookLoader",
+    "S3DirectoryLoader",
+    "GCSFileLoader",
+    "GCSDirectoryLoader",
+    "WebBaseLoader",
+    "IMSDbLoader",
+    "AZLyricsLoader",
+    "CollegeConfidentialLoader",
+    "IFixitLoader",
+    "GutenbergLoader",
+    "PagedPDFSplitter",
+    "EverNoteLoader",
+    "AirbyteJSONLoader",
+    "OnlinePDFLoader",
+    "PDFMinerLoader",
+    "TelegramChatLoader",
+    "SRTLoader",
+    "FacebookChatLoader",
+    "NotebookLoader",
+    "CoNLLULoader",
+    "AudioLoader",
+]
--- a/langchain/document_loaders/audio_files.py
+++ b/langchain/document_loaders/audio_files.py
@@ -0,0 +1,27 @@
+"""Loader that loads audio files."""
+from typing import List
+
+from langchain.audio_models.base import AudioBase
+from langchain.docstore.document import Document
+from langchain.document_loaders.base import BaseLoader
+
+
+class AudioLoader(BaseLoader):
+    """Loader that loads audio files and converts
+    to text using an audio model."""
+
+    def __init__(
+        self, audio_model: AudioBase, file_path: str, task: str = "transcribe"
+    ) -> None:
+        super().__init__()
+        self.audio_model = audio_model
+        self.file_path = file_path
+        self.task = task
+
+    def load(self) -> List[Document]:
+        """Load audio file then convert to text. \
+        Select among 'transcribe' and 'translate' tasks"""
+        raw_content = self.audio_model.transcript(self.file_path, self.task)
+        content = raw_content.strip()
+        metadata = {"source": self.file_path, "task": self.task}
+        return [Document(page_content=content, metadata=metadata)]