Compare commits

...

2 Commits

Author SHA1 Message Date
Vairo Di Pasquale
e381cdbd3f Added whisper compatibility (#1421)
TODO:
- fix compatibility with base

Co-authored-by: Klein Tahiraj
<62718109+klein-t@users.noreply.github.com>

---------

Co-authored-by: Klein Tahiraj <62718109+klein-t@users.noreply.github.com>
Co-authored-by: Klein Tahiraj <klein.tahiraj@gmail.com>
2023-03-09 16:42:29 -08:00
CG80499
d090e3605f Added AudioChain + AudioBanana (#1313)
This PR enables users to transcript audio files and add them to chains
using OpenAI's Whisper models on Banana.dev.

TODO
- Turn audio_chain_example.py into a notebook
2023-03-03 09:54:17 -08:00
11 changed files with 854 additions and 141 deletions

View File

@@ -0,0 +1,135 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bananadev\n",
"\n",
"This notebook covers how to use whisper from banana.dev, build an audio chain and deploy it in combination with llm to summarize a transcript gathered from an audio file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import langchain\n",
"from langchain.audio_models import AudioBanana\n",
"from langchain.cache import SQLiteCache\n",
"from langchain.chains import AudioChain, LLMChain, SimpleSequentialChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setting the cache"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"audio_cache = SQLiteCache(database_path=\".langchain.db\")\n",
"\n",
"langchain.llm_cache = audio_cache"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Defining the audio model\n",
"\n",
"An `AudioBanana` object has the following arguments:\n",
"\n",
"* `model_key` (str): model endpoint to use;\n",
"* `banana_api_key`(optional[str]): banana api key:\n",
"* `max_chars` (optional[int]): max number of chars to return.\n",
"\n",
"An `AudioChain` object has the following arguments:\n",
"* `audio_model` (AudioBase): the audio model to use;\n",
"* `output_key` (str): the task to be performed by the model. Not all the audio models support all the tasks. In the case of AudioBanana only `\"transcribe\"` is a valid output key. The output key will returned along with the text data gathered from the audio file.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"audio_model = AudioBanana(model_key=\"[YOUR MODEL KEY]\", max_chars=20000)\n",
"\n",
"audio_chain = AudioChain(audio_model=audio_model, output_key=\"transcribe\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setting up a llm and an LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0.7)\n",
"template = \"\"\"Speech: {text}\n",
"\n",
" Write a short 3 sentence summary of the speech.\n",
"\n",
" Summary:\"\"\"\n",
"\n",
"# note how the input variabòes\n",
"prompt_template = PromptTemplate(input_variables=[\"text\"], template=template)\n",
"\n",
"summary_chain = LLMChain(llm=llm, prompt=prompt_template)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Defining a SimpleSequentialChain and running"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"speech_summary_chain = SimpleSequentialChain(\n",
" chains=[audio_chain, summary_chain], verbose=True\n",
")\n",
"\n",
"audio_path = \"example_data/ihaveadream.mp3\"\n",
"speech_summary_chain.run(audio_path)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,248 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Whisper\n",
"\n",
"This notebook covers how to use whisper from openAI, build an audio chain and deploy it in combination with llm to summarize a transcript gathered from an audio file. A translation task is also shown."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.audio_models import Whisper\n",
"from langchain.chains import AudioChain, LLMChain, SimpleSequentialChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Transcript task"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Defining the audio model\n",
"\n",
"A `Whisper` objects has the following arguments:\n",
"\n",
"* `model_key` (str): model endpoint to use;\n",
"* `prompt` (optional[str]): openAi's whisper allows prompt to make the model context-aware and better transcribe/translate words;\n",
"* `language` (optional[str]): to add only in the case of transcript as a task;\n",
"* `max_chars` (optional[int]): max number of chars to return;\n",
"* `model` (optional[str]): name of the model to use, set to `whisper-1` by default;\n",
"* `temperature` (Optional[float]): temperature;\n",
"* `response format` (optional[str]): response format, set by default to `json`\n",
"\n",
"An `AudioChain` object has the following arguments:\n",
"* `audio_model` (AudioBase): the audio model to use;\n",
"* `output_key` (str): the task to be performed by the model. Not all the audio models support all the tasks. In the case of Whisper, the possible tasks are `\"transcribe\"` and `\"translate\"`.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"audio_model = Whisper(model_key=\"YOUR_OPENAI_KEY\", max_chars=20000)\n",
"audio_chain = AudioChain(audio_model=audio_model, output_key=\"transcribe\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setting up llm and an LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(\n",
" openai_api_key=\"YOUR_OPENAI_KEY\",\n",
" temperature=0.7,\n",
")\n",
"\n",
"template = \"\"\"Speech: {text}\n",
"Write a short 3 sentence summary of the speech.\n",
"Summary:\"\"\"\n",
"\n",
"prompt_template = PromptTemplate(input_variables=[\"text\"], template=template)\n",
"\n",
"summary_chain = LLMChain(llm=llm, prompt=prompt_template)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Defining a SimpleSequentialChain and running"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"speech_to_summary_chain = SimpleSequentialChain(\n",
" chains=[audio_chain, summary_chain], verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
"\u001b[36;1m\u001b[1;3mDo it! Just do it! Don't let your dreams be dreams. Yesterday, you said tomorrow. So just do it! Make your dreams come true! Just do it! Some people dream of success while you're going to wake up and work hard at it. Nothing is impossible! You should get to the point where anyone else would quit, and you're not going to stop there! No! What are you waiting for? Do it! Just do it! Yes you can! Just do it! If you're tired of starting over, stop giving up!\u001b[0m\n",
"\u001b[33;1m\u001b[1;3m The speaker encourages the audience to take action and pursue their dreams. They emphasize that nothing is impossible and that the only way to make their dreams come true is to work hard and not give up. They urge the audience to act now and not wait any longer.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' The speaker encourages the audience to take action and pursue their dreams. They emphasize that nothing is impossible and that the only way to make their dreams come true is to work hard and not give up. They urge the audience to act now and not wait any longer.'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"audio_path = \"example_data/doit.mp3\"\n",
"speech_to_summary_chain.run(audio_path)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Translate\n",
"To perform a translation, we just make few changes to the above workflow."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### New AudioChain with output key set to \"translation\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"audio_chain = AudioChain(audio_model=audio_model, output_key=\"translate\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### New SimpleSequentialChain with the new AudioChain and run"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"translation_to_summary_chain = SimpleSequentialChain(\n",
" chains=[audio_chain, summary_chain], verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
"\u001b[36;1m\u001b[1;3mI wanted a chicken! Ah, this is the chicken! I like this, I hadn't seen it before! A CHICKEN! I'm afraid of getting my hands dirty and not playing the guitar anymore, otherwise I would break it. Look, a chicken, really, beautiful!\u001b[0m\n",
"\u001b[33;1m\u001b[1;3m The speaker is excited to have found a chicken, noting that they have not seen it before. They express fear at getting their hands dirty, as it could prevent them from playing the guitar. Finally, they express admiration for the chicken's beauty.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\" The speaker is excited to have found a chicken, noting that they have not seen it before. They express fear at getting their hands dirty, as it could prevent them from playing the guitar. Finally, they express admiration for the chicken's beauty.\""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"audio_path = \"example_data/pollo.mp3\"\n",
"translation_to_summary_chain.run(audio_path)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,98 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Audio Files\n",
"\n",
"This notebook shows how to add data from audio files into a format suitable by LangChain."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import AudioLoader\n",
"from langchain.audio_models import Whisper"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"An `AudioLoader` object has as parameters:\n",
"\n",
"* `audio_model` (AudioBase): the audio model to use;\n",
"* `file_path` (str): the file path;\n",
"* `task` (str): the task to perform. Depending on the model, this can be `transcribe` or `translate`"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"audio_model = Whisper(model_key=\"[YOUR_OPENAI_API_KEY]\", max_chars=20000)\n",
"\n",
"file_path = r\"example_data\\pasolini.mp3\"\n",
"loader = AudioLoader(audio_model=audio_model, file_path=file_path, task=\"translate\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"`AudioLoader.load()` loads the data into a `Document` object."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='I think that fascism, the fascist regime, was not built on the conclusion that a group of criminals has power. And this group of criminals has power, but it has not been able to do anything. It has not been able to even remotely change the reality of Italy. Now, on the contrary, the regime is a democratic regime, etc. But that acculturation, that homologation that fascism has not been able to obtain, the power of today, that is, the power of consumerism, instead, it is able to obtain it perfectly. I can tell you without a doubt that true fascism is precisely this power of consumerism that is destroying Italy. And this thing happened so quickly that we did not realize it. It has happened in the last 5, 6, 7, 10 years. It was a kind of nightmare in which we saw Italy around us destroy, disappear, disappear. Now, perhaps awakening from this nightmare and looking around, we realize that there is nothing more to do.', lookup_str='', metadata={'source': 'docs\\\\modules\\\\document_loaders\\\\examples\\\\example_data\\\\pasolini.mp3', 'task': 'translate'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,9 @@
"""Wrappers on top of text to audio model APIs."""
from langchain.audio_models.bananadev import AudioBanana
from langchain.audio_models.whisper import Whisper
__all__ = [
"AudioBanana",
"Whisper",
]

View File

@@ -0,0 +1,70 @@
from typing import Dict, Optional
from pydantic import Extra, root_validator
from langchain.audio_models.base import AudioBase
from langchain.utils import get_from_dict_or_env
class AudioBanana(AudioBase):
"""Wrapper around Whisper models on Banana.
To use, you should have the ``banana-dev`` python package installed,
and the environment variable ``BANANA_API_KEY`` set with your API key.
Any parameters that are valid to be passed to the call can be passed
in, even if not explicitly saved on this class.
"""
model_key: str = ""
"""model endpoint to use"""
banana_api_key: Optional[str] = None
max_chars: Optional[int] = None
class Config:
"""Configuration for this pydantic config."""
extra = Extra.forbid
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
banana_api_key = get_from_dict_or_env(
values, "banana_api_key", "BANANA_API_KEY"
)
values["banana_api_key"] = banana_api_key
return values
def transcript(self, audio_path: str, task: str = "transcribe") -> str:
"""Call to Banana endpoint."""
try:
import banana_dev as banana
except ImportError:
raise ValueError(
"Could not import banana-dev python package. "
"Please install it with `pip install banana-dev`."
)
if task == "transcribe":
mp3 = self._read_mp3_audio(audio_path)
model_inputs = {"mp3BytesString": mp3}
response = banana.run(self.banana_api_key, self.model_key, model_inputs)
try:
text = response["modelOutputs"][0]["text"]
if not isinstance(text, str):
raise ValueError(f"Expected text to be a string, got {type(text)}")
except KeyError:
raise ValueError(
f"Response should be {'modelOutputs': [{'output': 'text'}]}."
f"Response was: {response}"
)
else:
raise ValueError(
f"Only task available is 'transcribe' for this model."
f"Task was: {task}"
)
return text[: self.max_chars] if self.max_chars else text

View File

@@ -0,0 +1,24 @@
"""Base class for text to audio models."""
import base64
import pathlib
from abc import ABC, abstractmethod
from io import BytesIO
from pydantic import BaseModel
class AudioBase(BaseModel, ABC):
@staticmethod
def _read_mp3_audio(audio_path: str) -> str:
"""Read audio file."""
if not pathlib.Path(audio_path).exists():
raise ValueError(f"Can't find audio file at {audio_path}")
if not audio_path.endswith(".mp3"):
raise ValueError("Only mp3 files are supported.")
with open(audio_path, "rb") as file:
mp3bytes = BytesIO(file.read())
return base64.b64encode(mp3bytes.getvalue()).decode("ISO-8859-1")
@abstractmethod
def transcript(self, audio_path: str, task: str) -> str:
"""Transcribe audio file."""

View File

@@ -0,0 +1,74 @@
from typing import Optional
from pydantic import Extra
from langchain.audio_models.base import AudioBase
class Whisper(AudioBase):
"""Wrapper around Whisper API.
To use, you should have the ``openai`` python package installed,
and the environment variable ``OPENAI_API_KEY`` set with your API key.
Any parameters that are valid to be passed to the call can be passed
in, even if not explicitly saved on this class.
"""
model_key: str = ""
prompt: Optional[str] = ""
language: Optional[str] = ""
max_chars: Optional[int] = None
model: Optional[str] = "whisper-1"
temperature: Optional[float] = 0.0
response_format: Optional[str] = "json"
class Config:
"""Configuration for this pydantic config."""
extra = Extra.forbid
def transcript(self, audio_path: str, task: str = "transcribe") -> str:
"""Call to Whisper transcribe endpoint."""
try:
import openai
except ImportError:
raise ValueError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)
if task == "transcribe":
openai.api_key = self.model_key
audio_file = open(audio_path, "rb")
response = openai.Audio.transcribe(
file=audio_file,
model=self.model,
prompt=self.prompt,
language=self.language,
temperature=self.temperature,
response_format=self.response_format,
)
if task == "translate":
openai.api_key = self.model_key
audio_file = open(audio_path, "rb")
response = openai.Audio.translate(
file=audio_file,
model=self.model,
prompt=self.prompt,
temperature=self.temperature,
response_format=self.response_format,
)
try:
text = response["text"]
if not isinstance(text, str):
raise ValueError(f"Expected text to be a string, got {type(text)}")
except KeyError:
raise ValueError(
f"Response should be {'modelOutputs': [{'output': 'text'}]}."
f"Response was: {response}"
)
return text[: self.max_chars] if self.max_chars else text

View File

@@ -1,55 +1,57 @@
"""Chains are easily reusable components which can be linked together."""
from langchain.chains.api.base import APIChain
from langchain.chains.chat_vector_db.base import ChatVectorDBChain
from langchain.chains.combine_documents.base import AnalyzeDocumentChain
from langchain.chains.constitutional_ai.base import ConstitutionalChain
from langchain.chains.conversation.base import ConversationChain
from langchain.chains.graph_qa.base import GraphQAChain
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
from langchain.chains.llm import LLMChain
from langchain.chains.llm_bash.base import LLMBashChain
from langchain.chains.llm_checker.base import LLMCheckerChain
from langchain.chains.llm_math.base import LLMMathChain
from langchain.chains.llm_requests import LLMRequestsChain
from langchain.chains.llm_summarization_checker.base import LLMSummarizationCheckerChain
from langchain.chains.loading import load_chain
from langchain.chains.mapreduce import MapReduceChain
from langchain.chains.moderation import OpenAIModerationChain
from langchain.chains.pal.base import PALChain
from langchain.chains.qa_with_sources.base import QAWithSourcesChain
from langchain.chains.qa_with_sources.vector_db import VectorDBQAWithSourcesChain
from langchain.chains.sequential import SequentialChain, SimpleSequentialChain
from langchain.chains.sql_database.base import (
SQLDatabaseChain,
SQLDatabaseSequentialChain,
)
from langchain.chains.transform import TransformChain
from langchain.chains.vector_db_qa.base import VectorDBQA
__all__ = [
"ConversationChain",
"LLMChain",
"LLMBashChain",
"LLMCheckerChain",
"LLMSummarizationCheckerChain",
"LLMMathChain",
"PALChain",
"QAWithSourcesChain",
"SQLDatabaseChain",
"SequentialChain",
"SimpleSequentialChain",
"VectorDBQA",
"VectorDBQAWithSourcesChain",
"APIChain",
"LLMRequestsChain",
"TransformChain",
"MapReduceChain",
"OpenAIModerationChain",
"SQLDatabaseSequentialChain",
"load_chain",
"AnalyzeDocumentChain",
"HypotheticalDocumentEmbedder",
"ChatVectorDBChain",
"GraphQAChain",
"ConstitutionalChain",
]
"""Chains are easily reusable components which can be linked together."""
from langchain.chains.api.base import APIChain
from langchain.chains.audio import AudioChain
from langchain.chains.chat_vector_db.base import ChatVectorDBChain
from langchain.chains.combine_documents.base import AnalyzeDocumentChain
from langchain.chains.constitutional_ai.base import ConstitutionalChain
from langchain.chains.conversation.base import ConversationChain
from langchain.chains.graph_qa.base import GraphQAChain
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
from langchain.chains.llm import LLMChain
from langchain.chains.llm_bash.base import LLMBashChain
from langchain.chains.llm_checker.base import LLMCheckerChain
from langchain.chains.llm_math.base import LLMMathChain
from langchain.chains.llm_requests import LLMRequestsChain
from langchain.chains.llm_summarization_checker.base import LLMSummarizationCheckerChain
from langchain.chains.loading import load_chain
from langchain.chains.mapreduce import MapReduceChain
from langchain.chains.moderation import OpenAIModerationChain
from langchain.chains.pal.base import PALChain
from langchain.chains.qa_with_sources.base import QAWithSourcesChain
from langchain.chains.qa_with_sources.vector_db import VectorDBQAWithSourcesChain
from langchain.chains.sequential import SequentialChain, SimpleSequentialChain
from langchain.chains.sql_database.base import (
SQLDatabaseChain,
SQLDatabaseSequentialChain,
)
from langchain.chains.transform import TransformChain
from langchain.chains.vector_db_qa.base import VectorDBQA
__all__ = [
"ConversationChain",
"LLMChain",
"AudioChain",
"LLMBashChain",
"LLMCheckerChain",
"LLMSummarizationCheckerChain",
"LLMMathChain",
"PALChain",
"QAWithSourcesChain",
"SQLDatabaseChain",
"SequentialChain",
"SimpleSequentialChain",
"VectorDBQA",
"VectorDBQAWithSourcesChain",
"APIChain",
"LLMRequestsChain",
"TransformChain",
"MapReduceChain",
"OpenAIModerationChain",
"SQLDatabaseSequentialChain",
"load_chain",
"AnalyzeDocumentChain",
"HypotheticalDocumentEmbedder",
"ChatVectorDBChain",
"GraphQAChain",
"ConstitutionalChain",
]

24
langchain/chains/audio.py Normal file
View File

@@ -0,0 +1,24 @@
from typing import Dict, List
from langchain.audio_models.base import AudioBase
from langchain.chains.base import Chain
class AudioChain(Chain):
audio_model: AudioBase
output_key: str = "transcribe"
@property
def input_keys(self) -> List[str]:
return ["audio_file"]
@property
def output_keys(self) -> List[str]:
return [self.output_key]
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
file_path = inputs["audio_file"]
task = self.output_key
content = self.audio_model.transcript(file_path, task).strip()
return {self.output_key: content}

View File

@@ -1,86 +1,88 @@
"""All different types of document loaders."""
from langchain.document_loaders.airbyte_json import AirbyteJSONLoader
from langchain.document_loaders.azlyrics import AZLyricsLoader
from langchain.document_loaders.college_confidential import CollegeConfidentialLoader
from langchain.document_loaders.conllu import CoNLLULoader
from langchain.document_loaders.directory import DirectoryLoader
from langchain.document_loaders.docx import UnstructuredDocxLoader
from langchain.document_loaders.email import UnstructuredEmailLoader
from langchain.document_loaders.evernote import EverNoteLoader
from langchain.document_loaders.facebook_chat import FacebookChatLoader
from langchain.document_loaders.gcs_directory import GCSDirectoryLoader
from langchain.document_loaders.gcs_file import GCSFileLoader
from langchain.document_loaders.gitbook import GitbookLoader
from langchain.document_loaders.googledrive import GoogleDriveLoader
from langchain.document_loaders.gutenberg import GutenbergLoader
from langchain.document_loaders.hn import HNLoader
from langchain.document_loaders.html import UnstructuredHTMLLoader
from langchain.document_loaders.ifixit import IFixitLoader
from langchain.document_loaders.image import UnstructuredImageLoader
from langchain.document_loaders.imsdb import IMSDbLoader
from langchain.document_loaders.notebook import NotebookLoader
from langchain.document_loaders.notion import NotionDirectoryLoader
from langchain.document_loaders.obsidian import ObsidianLoader
from langchain.document_loaders.online_pdf import OnlinePDFLoader
from langchain.document_loaders.paged_pdf import PagedPDFSplitter
from langchain.document_loaders.pdf import PDFMinerLoader, UnstructuredPDFLoader
from langchain.document_loaders.powerpoint import UnstructuredPowerPointLoader
from langchain.document_loaders.readthedocs import ReadTheDocsLoader
from langchain.document_loaders.roam import RoamLoader
from langchain.document_loaders.s3_directory import S3DirectoryLoader
from langchain.document_loaders.s3_file import S3FileLoader
from langchain.document_loaders.srt import SRTLoader
from langchain.document_loaders.telegram import TelegramChatLoader
from langchain.document_loaders.text import TextLoader
from langchain.document_loaders.unstructured import (
UnstructuredFileIOLoader,
UnstructuredFileLoader,
)
from langchain.document_loaders.url import UnstructuredURLLoader
from langchain.document_loaders.web_base import WebBaseLoader
from langchain.document_loaders.word_document import UnstructuredWordDocumentLoader
from langchain.document_loaders.youtube import YoutubeLoader
__all__ = [
"UnstructuredFileLoader",
"UnstructuredFileIOLoader",
"UnstructuredURLLoader",
"DirectoryLoader",
"NotionDirectoryLoader",
"ReadTheDocsLoader",
"GoogleDriveLoader",
"UnstructuredHTMLLoader",
"UnstructuredPowerPointLoader",
"UnstructuredWordDocumentLoader",
"UnstructuredPDFLoader",
"UnstructuredImageLoader",
"ObsidianLoader",
"UnstructuredDocxLoader",
"UnstructuredEmailLoader",
"RoamLoader",
"YoutubeLoader",
"S3FileLoader",
"TextLoader",
"HNLoader",
"GitbookLoader",
"S3DirectoryLoader",
"GCSFileLoader",
"GCSDirectoryLoader",
"WebBaseLoader",
"IMSDbLoader",
"AZLyricsLoader",
"CollegeConfidentialLoader",
"IFixitLoader",
"GutenbergLoader",
"PagedPDFSplitter",
"EverNoteLoader",
"AirbyteJSONLoader",
"OnlinePDFLoader",
"PDFMinerLoader",
"TelegramChatLoader",
"SRTLoader",
"FacebookChatLoader",
"NotebookLoader",
"CoNLLULoader",
]
"""All different types of document loaders."""
from langchain.document_loaders.airbyte_json import AirbyteJSONLoader
from langchain.document_loaders.audio_files import AudioLoader
from langchain.document_loaders.azlyrics import AZLyricsLoader
from langchain.document_loaders.college_confidential import CollegeConfidentialLoader
from langchain.document_loaders.conllu import CoNLLULoader
from langchain.document_loaders.directory import DirectoryLoader
from langchain.document_loaders.docx import UnstructuredDocxLoader
from langchain.document_loaders.email import UnstructuredEmailLoader
from langchain.document_loaders.evernote import EverNoteLoader
from langchain.document_loaders.facebook_chat import FacebookChatLoader
from langchain.document_loaders.gcs_directory import GCSDirectoryLoader
from langchain.document_loaders.gcs_file import GCSFileLoader
from langchain.document_loaders.gitbook import GitbookLoader
from langchain.document_loaders.googledrive import GoogleDriveLoader
from langchain.document_loaders.gutenberg import GutenbergLoader
from langchain.document_loaders.hn import HNLoader
from langchain.document_loaders.html import UnstructuredHTMLLoader
from langchain.document_loaders.ifixit import IFixitLoader
from langchain.document_loaders.image import UnstructuredImageLoader
from langchain.document_loaders.imsdb import IMSDbLoader
from langchain.document_loaders.notebook import NotebookLoader
from langchain.document_loaders.notion import NotionDirectoryLoader
from langchain.document_loaders.obsidian import ObsidianLoader
from langchain.document_loaders.online_pdf import OnlinePDFLoader
from langchain.document_loaders.paged_pdf import PagedPDFSplitter
from langchain.document_loaders.pdf import PDFMinerLoader, UnstructuredPDFLoader
from langchain.document_loaders.powerpoint import UnstructuredPowerPointLoader
from langchain.document_loaders.readthedocs import ReadTheDocsLoader
from langchain.document_loaders.roam import RoamLoader
from langchain.document_loaders.s3_directory import S3DirectoryLoader
from langchain.document_loaders.s3_file import S3FileLoader
from langchain.document_loaders.srt import SRTLoader
from langchain.document_loaders.telegram import TelegramChatLoader
from langchain.document_loaders.text import TextLoader
from langchain.document_loaders.unstructured import (
UnstructuredFileIOLoader,
UnstructuredFileLoader,
)
from langchain.document_loaders.url import UnstructuredURLLoader
from langchain.document_loaders.web_base import WebBaseLoader
from langchain.document_loaders.word_document import UnstructuredWordDocumentLoader
from langchain.document_loaders.youtube import YoutubeLoader
__all__ = [
"UnstructuredFileLoader",
"UnstructuredFileIOLoader",
"UnstructuredURLLoader",
"DirectoryLoader",
"NotionDirectoryLoader",
"ReadTheDocsLoader",
"GoogleDriveLoader",
"UnstructuredHTMLLoader",
"UnstructuredPowerPointLoader",
"UnstructuredWordDocumentLoader",
"UnstructuredPDFLoader",
"UnstructuredImageLoader",
"ObsidianLoader",
"UnstructuredDocxLoader",
"UnstructuredEmailLoader",
"RoamLoader",
"YoutubeLoader",
"S3FileLoader",
"TextLoader",
"HNLoader",
"GitbookLoader",
"S3DirectoryLoader",
"GCSFileLoader",
"GCSDirectoryLoader",
"WebBaseLoader",
"IMSDbLoader",
"AZLyricsLoader",
"CollegeConfidentialLoader",
"IFixitLoader",
"GutenbergLoader",
"PagedPDFSplitter",
"EverNoteLoader",
"AirbyteJSONLoader",
"OnlinePDFLoader",
"PDFMinerLoader",
"TelegramChatLoader",
"SRTLoader",
"FacebookChatLoader",
"NotebookLoader",
"CoNLLULoader",
"AudioLoader",
]

View File

@@ -0,0 +1,27 @@
"""Loader that loads audio files."""
from typing import List
from langchain.audio_models.base import AudioBase
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
class AudioLoader(BaseLoader):
"""Loader that loads audio files and converts
to text using an audio model."""
def __init__(
self, audio_model: AudioBase, file_path: str, task: str = "transcribe"
) -> None:
super().__init__()
self.audio_model = audio_model
self.file_path = file_path
self.task = task
def load(self) -> List[Document]:
"""Load audio file then convert to text. \
Select among 'transcribe' and 'translate' tasks"""
raw_content = self.audio_model.transcript(self.file_path, self.task)
content = raw_content.strip()
metadata = {"source": self.file_path, "task": self.task}
return [Document(page_content=content, metadata=metadata)]