Fixed the import error in OpenAIWhisperParserLocal and resolved the L… (#29168)

…angChain parser issue.

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
This commit is contained in:
Syed Muneeb Abbas 2025-01-13 19:47:31 +05:00 committed by GitHub
parent c115c09b6d
commit 8ef7f3eacc
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -1,321 +1,363 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e48afb8d",
"metadata": {},
"source": [
"# YouTube audio\n",
"\n",
"Building chat or QA applications on YouTube videos is a topic of high interest.\n",
"\n",
"Below we show how to easily go from a `YouTube url` to `audio of the video` to `text` to `chat`!\n",
"\n",
"We wil use the `OpenAIWhisperParser`, which will use the OpenAI Whisper API to transcribe audio to text, \n",
"and the `OpenAIWhisperParserLocal` for local support and running on private clouds or on premise.\n",
"\n",
"Note: You will need to have an `OPENAI_API_KEY` supplied."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5f34e934",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.blob_loaders.youtube_audio import (\n",
" YoutubeAudioLoader,\n",
")\n",
"from langchain_community.document_loaders.generic import GenericLoader\n",
"from langchain_community.document_loaders.parsers import (\n",
" OpenAIWhisperParser,\n",
" OpenAIWhisperParserLocal,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "85fc12bd",
"metadata": {},
"source": [
"We will use `yt_dlp` to download audio for YouTube urls.\n",
"\n",
"We will use `pydub` to split downloaded audio files (such that we adhere to Whisper API's 25MB file size limit)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb5a6606",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet yt_dlp\n",
"%pip install --upgrade --quiet pydub\n",
"%pip install --upgrade --quiet librosa"
]
},
{
"cell_type": "markdown",
"id": "b0e119f4",
"metadata": {},
"source": [
"### YouTube url to text\n",
"\n",
"Use `YoutubeAudioLoader` to fetch / download the audio files.\n",
"\n",
"Then, ues `OpenAIWhisperParser()` to transcribe them to text.\n",
"\n",
"Let's take the first lecture of Andrej Karpathy's YouTube course as an example! "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8682f256",
"metadata": {},
"outputs": [],
"source": [
"# set a flag to switch between local and remote parsing\n",
"# change this to True if you want to use local parsing\n",
"local = False"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "23e1e134",
"metadata": {},
"outputs": [
"cells": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[youtube] Extracting URL: https://youtu.be/kCc8FmEb1nY\n",
"[youtube] kCc8FmEb1nY: Downloading webpage\n",
"[youtube] kCc8FmEb1nY: Downloading android player API JSON\n",
"[info] kCc8FmEb1nY: Downloading 1 format(s): 140\n",
"[dashsegments] Total fragments: 11\n",
"[download] Destination: /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/Let's build GPT from scratch, in code, spelled out..m4a\n",
"[download] 100% of 107.73MiB in 00:00:18 at 5.92MiB/s \n",
"[FixupM4a] Correcting container of \"/Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/Let's build GPT from scratch, in code, spelled out..m4a\"\n",
"[ExtractAudio] Not converting audio /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/Let's build GPT from scratch, in code, spelled out..m4a; file is already in target format m4a\n",
"[youtube] Extracting URL: https://youtu.be/VMj-3S1tku0\n",
"[youtube] VMj-3S1tku0: Downloading webpage\n",
"[youtube] VMj-3S1tku0: Downloading android player API JSON\n",
"[info] VMj-3S1tku0: Downloading 1 format(s): 140\n",
"[download] /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/The spelled-out intro to neural networks and backpropagation building micrograd.m4a has already been downloaded\n",
"[download] 100% of 134.98MiB\n",
"[ExtractAudio] Not converting audio /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/The spelled-out intro to neural networks and backpropagation building micrograd.m4a; file is already in target format m4a\n"
]
}
],
"source": [
"# Two Karpathy lecture videos\n",
"urls = [\"https://youtu.be/kCc8FmEb1nY\", \"https://youtu.be/VMj-3S1tku0\"]\n",
"\n",
"# Directory to save audio files\n",
"save_dir = \"~/Downloads/YouTube\"\n",
"\n",
"# Transcribe the videos to text\n",
"if local:\n",
" loader = GenericLoader(\n",
" YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParserLocal()\n",
" )\n",
"else:\n",
" loader = GenericLoader(YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParser())\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "72a94fd8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Hello, my name is Andrej and I've been training deep neural networks for a bit more than a decade. And in this lecture I'd like to show you what neural network training looks like under the hood. So in particular we are going to start with a blank Jupyter notebook and by the end of this lecture we will define and train a neural net and you'll get to see everything that goes on under the hood and exactly sort of how that works on an intuitive level. Now specifically what I would like to do is I w\""
"cell_type": "markdown",
"id": "e48afb8d",
"metadata": {
"id": "e48afb8d"
},
"source": [
"# YouTube audio\n",
"\n",
"Building chat or QA applications on YouTube videos is a topic of high interest.\n",
"\n",
"Below we show how to easily go from a `YouTube url` to `audio of the video` to `text` to `chat`!\n",
"\n",
"We wil use the `OpenAIWhisperParser`, which will use the OpenAI Whisper API to transcribe audio to text,\n",
"and the `OpenAIWhisperParserLocal` for local support and running on private clouds or on premise.\n",
"\n",
"Note: You will need to have an `OPENAI_API_KEY` supplied."
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Returns a list of Documents, which can be easily viewed or parsed\n",
"docs[0].page_content[0:500]"
]
},
{
"cell_type": "markdown",
"id": "93be6b49",
"metadata": {},
"source": [
"### Building a chat app from YouTube video\n",
"\n",
"Given `Documents`, we can easily enable chat / question+answering."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1823f042",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain_community.vectorstores import FAISS\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7257cda1",
"metadata": {},
"outputs": [],
"source": [
"# Combine doc\n",
"combined_docs = [doc.page_content for doc in docs]\n",
"text = \" \".join(combined_docs)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "147c0c55",
"metadata": {},
"outputs": [],
"source": [
"# Split them\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=150)\n",
"splits = text_splitter.split_text(text)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f3556703",
"metadata": {},
"outputs": [],
"source": [
"# Build an index\n",
"embeddings = OpenAIEmbeddings()\n",
"vectordb = FAISS.from_texts(splits, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "beaa99db",
"metadata": {},
"outputs": [],
"source": [
"# Build a QA chain\n",
"qa_chain = RetrievalQA.from_chain_type(\n",
" llm=ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0),\n",
" chain_type=\"stuff\",\n",
" retriever=vectordb.as_retriever(),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "f2239a62",
"metadata": {},
"outputs": [
},
{
"data": {
"text/plain": [
"\"We need to zero out the gradient before backprop at each step because the backward pass accumulates gradients in the grad attribute of each parameter. If we don't reset the grad to zero before each backward pass, the gradients will accumulate and add up, leading to incorrect updates and slower convergence. By resetting the grad to zero before each backward pass, we ensure that the gradients are calculated correctly and that the optimization process works as intended.\""
"cell_type": "code",
"execution_count": null,
"id": "5f34e934",
"metadata": {
"id": "5f34e934"
},
"outputs": [],
"source": [
"from langchain_community.document_loaders.blob_loaders.youtube_audio import (\n",
" YoutubeAudioLoader,\n",
")\n",
"from langchain_community.document_loaders.generic import GenericLoader\n",
"from langchain_community.document_loaders.parsers.audio import (\n",
" OpenAIWhisperParser,\n",
" OpenAIWhisperParserLocal,\n",
")"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Ask a question!\n",
"query = \"Why do we need to zero out the gradient before backprop at each step?\"\n",
"qa_chain.run(query)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a8d01098",
"metadata": {},
"outputs": [
},
{
"data": {
"text/plain": [
"'In the context of transformers, an encoder is a component that reads in a sequence of input tokens and generates a sequence of hidden representations. On the other hand, a decoder is a component that takes in a sequence of hidden representations and generates a sequence of output tokens. The main difference between the two is that the encoder is used to encode the input sequence into a fixed-length representation, while the decoder is used to decode the fixed-length representation into an output sequence. In machine translation, for example, the encoder reads in the source language sentence and generates a fixed-length representation, which is then used by the decoder to generate the target language sentence.'"
"cell_type": "markdown",
"id": "85fc12bd",
"metadata": {
"id": "85fc12bd"
},
"source": [
"We will use `yt_dlp` to download audio for YouTube urls.\n",
"\n",
"We will use `pydub` to split downloaded audio files (such that we adhere to Whisper API's 25MB file size limit)."
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What is the difference between an encoder and decoder?\"\n",
"qa_chain.run(query)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "fe1e77dd",
"metadata": {},
"outputs": [
},
{
"data": {
"text/plain": [
"'For any token, x is the input vector that contains the private information of that token, k and q are the key and query vectors respectively, which are produced by forwarding linear modules on x, and v is the vector that is calculated by propagating the same linear module on x again. The key vector represents what the token contains, and the query vector represents what the token is looking for. The vector v is the information that the token will communicate to other tokens if it finds them interesting, and it gets aggregated for the purposes of the self-attention mechanism.'"
"cell_type": "code",
"execution_count": null,
"id": "fb5a6606",
"metadata": {
"id": "fb5a6606"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet yt_dlp\n",
"%pip install --upgrade --quiet pydub\n",
"%pip install --upgrade --quiet librosa"
]
},
{
"cell_type": "markdown",
"id": "b0e119f4",
"metadata": {
"id": "b0e119f4"
},
"source": [
"### YouTube url to text\n",
"\n",
"Use `YoutubeAudioLoader` to fetch / download the audio files.\n",
"\n",
"Then, ues `OpenAIWhisperParser()` to transcribe them to text.\n",
"\n",
"Let's take the first lecture of Andrej Karpathy's YouTube course as an example!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8682f256",
"metadata": {
"id": "8682f256"
},
"outputs": [],
"source": [
"# set a flag to switch between local and remote parsing\n",
"# change this to True if you want to use local parsing\n",
"local = False"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23e1e134",
"metadata": {
"id": "23e1e134",
"outputId": "0794ffeb-f912-48cc-e3cb-3b4d6e5221c7"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[youtube] Extracting URL: https://youtu.be/kCc8FmEb1nY\n",
"[youtube] kCc8FmEb1nY: Downloading webpage\n",
"[youtube] kCc8FmEb1nY: Downloading android player API JSON\n",
"[info] kCc8FmEb1nY: Downloading 1 format(s): 140\n",
"[dashsegments] Total fragments: 11\n",
"[download] Destination: /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/Let's build GPT from scratch, in code, spelled out..m4a\n",
"[download] 100% of 107.73MiB in 00:00:18 at 5.92MiB/s \n",
"[FixupM4a] Correcting container of \"/Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/Let's build GPT from scratch, in code, spelled out..m4a\"\n",
"[ExtractAudio] Not converting audio /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/Let's build GPT from scratch, in code, spelled out..m4a; file is already in target format m4a\n",
"[youtube] Extracting URL: https://youtu.be/VMj-3S1tku0\n",
"[youtube] VMj-3S1tku0: Downloading webpage\n",
"[youtube] VMj-3S1tku0: Downloading android player API JSON\n",
"[info] VMj-3S1tku0: Downloading 1 format(s): 140\n",
"[download] /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/The spelled-out intro to neural networks and backpropagation building micrograd.m4a has already been downloaded\n",
"[download] 100% of 134.98MiB\n",
"[ExtractAudio] Not converting audio /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/The spelled-out intro to neural networks and backpropagation building micrograd.m4a; file is already in target format m4a\n"
]
}
],
"source": [
"# Two Karpathy lecture videos\n",
"urls = [\"https://youtu.be/kCc8FmEb1nY\", \"https://youtu.be/VMj-3S1tku0\"]\n",
"\n",
"# Directory to save audio files\n",
"save_dir = \"~/Downloads/YouTube\"\n",
"\n",
"# Transcribe the videos to text\n",
"if local:\n",
" loader = GenericLoader(\n",
" YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParserLocal()\n",
" )\n",
"else:\n",
" loader = GenericLoader(YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParser())\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72a94fd8",
"metadata": {
"id": "72a94fd8",
"outputId": "b024759c-3925-40c1-9c59-2f9dabee0248"
},
"outputs": [
{
"data": {
"text/plain": [
"\"Hello, my name is Andrej and I've been training deep neural networks for a bit more than a decade. And in this lecture I'd like to show you what neural network training looks like under the hood. So in particular we are going to start with a blank Jupyter notebook and by the end of this lecture we will define and train a neural net and you'll get to see everything that goes on under the hood and exactly sort of how that works on an intuitive level. Now specifically what I would like to do is I w\""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Returns a list of Documents, which can be easily viewed or parsed\n",
"docs[0].page_content[0:500]"
]
},
{
"cell_type": "markdown",
"id": "93be6b49",
"metadata": {
"id": "93be6b49"
},
"source": [
"### Building a chat app from YouTube video\n",
"\n",
"Given `Documents`, we can easily enable chat / question+answering."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1823f042",
"metadata": {
"id": "1823f042"
},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain_community.vectorstores import FAISS\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7257cda1",
"metadata": {
"id": "7257cda1"
},
"outputs": [],
"source": [
"# Combine doc\n",
"combined_docs = [doc.page_content for doc in docs]\n",
"text = \" \".join(combined_docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "147c0c55",
"metadata": {
"id": "147c0c55"
},
"outputs": [],
"source": [
"# Split them\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=150)\n",
"splits = text_splitter.split_text(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3556703",
"metadata": {
"id": "f3556703"
},
"outputs": [],
"source": [
"# Build an index\n",
"embeddings = OpenAIEmbeddings()\n",
"vectordb = FAISS.from_texts(splits, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "beaa99db",
"metadata": {
"id": "beaa99db"
},
"outputs": [],
"source": [
"# Build a QA chain\n",
"qa_chain = RetrievalQA.from_chain_type(\n",
" llm=ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0),\n",
" chain_type=\"stuff\",\n",
" retriever=vectordb.as_retriever(),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2239a62",
"metadata": {
"id": "f2239a62",
"outputId": "b8de052d-cb76-44c5-bb0c-57e7398e89e6"
},
"outputs": [
{
"data": {
"text/plain": [
"\"We need to zero out the gradient before backprop at each step because the backward pass accumulates gradients in the grad attribute of each parameter. If we don't reset the grad to zero before each backward pass, the gradients will accumulate and add up, leading to incorrect updates and slower convergence. By resetting the grad to zero before each backward pass, we ensure that the gradients are calculated correctly and that the optimization process works as intended.\""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Ask a question!\n",
"query = \"Why do we need to zero out the gradient before backprop at each step?\"\n",
"qa_chain.run(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8d01098",
"metadata": {
"id": "a8d01098",
"outputId": "9d66d66e-fc7f-4ac9-b104-a8e45e962949"
},
"outputs": [
{
"data": {
"text/plain": [
"'In the context of transformers, an encoder is a component that reads in a sequence of input tokens and generates a sequence of hidden representations. On the other hand, a decoder is a component that takes in a sequence of hidden representations and generates a sequence of output tokens. The main difference between the two is that the encoder is used to encode the input sequence into a fixed-length representation, while the decoder is used to decode the fixed-length representation into an output sequence. In machine translation, for example, the encoder reads in the source language sentence and generates a fixed-length representation, which is then used by the decoder to generate the target language sentence.'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What is the difference between an encoder and decoder?\"\n",
"qa_chain.run(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe1e77dd",
"metadata": {
"id": "fe1e77dd",
"outputId": "19479403-92c9-471e-8c93-5df4b17f007a"
},
"outputs": [
{
"data": {
"text/plain": [
"'For any token, x is the input vector that contains the private information of that token, k and q are the key and query vectors respectively, which are produced by forwarding linear modules on x, and v is the vector that is calculated by propagating the same linear module on x again. The key vector represents what the token contains, and the query vector represents what the token is looking for. The vector v is the information that the token will communicate to other tokens if it finds them interesting, and it gets aggregated for the purposes of the self-attention mechanism.'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"For any token, what are x, k, v, and q?\"\n",
"qa_chain.run(query)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"For any token, what are x, k, v, and q?\"\n",
"qa_chain.run(query)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"vscode": {
"interpreter": {
"hash": "97cc609b13305c559618ec78a438abc56230b9381f827f22d070313b9a1f3777"
}
},
"colab": {
"provenance": []
}
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"vscode": {
"interpreter": {
"hash": "97cc609b13305c559618ec78a438abc56230b9381f827f22d070313b9a1f3777"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
"nbformat": 4,
"nbformat_minor": 5
}