mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-04 04:28:58 +00:00
feat: Add Google Speech to Text API Document Loader (#12298)
- Add Document Loader for Google Speech to Text - Similar Structure to [Assembly AI Document Loader][1] [1]: https://python.langchain.com/docs/integrations/document_loaders/assemblyai
This commit is contained in:
File diff suppressed because one or more lines are too long
@@ -0,0 +1,202 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Google Speech-to-Text Audio Transcripts\n",
|
||||
"\n",
|
||||
"The `GoogleSpeechToTextLoader` allows to transcribe audio files with the [Google Cloud Speech-to-Text API](https://cloud.google.com/speech-to-text) and loads the transcribed text into documents.\n",
|
||||
"\n",
|
||||
"To use it, you should have the `google-cloud-speech` python package installed, and a Google Cloud project with the [Speech-to-Text API enabled](https://cloud.google.com/speech-to-text/v2/docs/transcribe-client-libraries#before_you_begin).\n",
|
||||
"\n",
|
||||
"- [Bringing the power of large models to Google Cloud’s Speech API](https://cloud.google.com/blog/products/ai-machine-learning/bringing-power-large-models-google-clouds-speech-api)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation & setup\n",
|
||||
"\n",
|
||||
"First, you need to install the `google-cloud-speech` python package.\n",
|
||||
"\n",
|
||||
"You can find more info about it on the [Speech-to-Text client libraries](https://cloud.google.com/speech-to-text/v2/docs/libraries) page.\n",
|
||||
"\n",
|
||||
"Follow the [quickstart guide](https://cloud.google.com/speech-to-text/v2/docs/sync-recognize) in the Google Cloud documentation to create a project and enable the API."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install google-cloud-speech\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example\n",
|
||||
"\n",
|
||||
"The `GoogleSpeechToTextLoader` must include the `project_id` and `file_path` arguments. Audio files can be specified as a Google Cloud Storage URI (`gs://...`) or a local file path.\n",
|
||||
"\n",
|
||||
"Only synchronous requests are supported by the loader, which has a [limit of 60 seconds or 10MB](https://cloud.google.com/speech-to-text/v2/docs/sync-recognize#:~:text=60%20seconds%20and/or%2010%20MB) per audio file."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import GoogleSpeechToTextLoader\n",
|
||||
"\n",
|
||||
"project_id = \"<PROJECT_ID>\"\n",
|
||||
"file_path = \"gs://cloud-samples-data/speech/audio.flac\"\n",
|
||||
"# or a local file path: file_path = \"./audio.wav\"\n",
|
||||
"\n",
|
||||
"loader = GoogleSpeechToTextLoader(project_id=project_id, file_path=file_path)\n",
|
||||
"\n",
|
||||
"docs = loader.load()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note: Calling `loader.load()` blocks until the transcription is finished."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The transcribed text is available in the `page_content`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs[0].page_content\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```\n",
|
||||
"\"How old is the Brooklyn Bridge?\"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `metadata` contains the full JSON response with more meta information:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs[0].metadata\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```json\n",
|
||||
"{\n",
|
||||
" 'language_code': 'en-US',\n",
|
||||
" 'result_end_offset': datetime.timedelta(seconds=1)\n",
|
||||
"}\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Recognition Config\n",
|
||||
"\n",
|
||||
"You can specify the `config` argument to use different speech recognition models and enable specific features.\n",
|
||||
"\n",
|
||||
"Refer to the [Speech-to-Text recognizers documentation](https://cloud.google.com/speech-to-text/v2/docs/recognizers) and the [`RecognizeRequest`](https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.types.RecognizeRequest) API reference for information on how to set a custom configuation.\n",
|
||||
"\n",
|
||||
"If you don't specify a `config`, the following options will be selected automatically:\n",
|
||||
"\n",
|
||||
"- Model: [Chirp Universal Speech Model](https://cloud.google.com/speech-to-text/v2/docs/chirp-model)\n",
|
||||
"- Language: `en-US`\n",
|
||||
"- Audio Encoding: Automatically Detected\n",
|
||||
"- Automatic Punctuation: Enabled"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from google.cloud.speech_v2 import AutoDetectDecodingConfig, RecognitionConfig, RecognitionFeatures\n",
|
||||
"from langchain.document_loaders import GoogleSpeechToTextLoader\n",
|
||||
"\n",
|
||||
"project_id = \"<PROJECT_ID>\"\n",
|
||||
"location = \"global\"\n",
|
||||
"recognizer_id = \"<RECOGNIZER_ID>\"\n",
|
||||
"file_path = \"./audio.wav\"\n",
|
||||
"\n",
|
||||
"config = RecognitionConfig(\n",
|
||||
" auto_decoding_config=AutoDetectDecodingConfig(),\n",
|
||||
" language_codes=[\"en-US\"],\n",
|
||||
" model=\"long\",\n",
|
||||
" features=RecognitionFeatures(\n",
|
||||
" enable_automatic_punctuation=False,\n",
|
||||
" profanity_filter=True,\n",
|
||||
" enable_spoken_punctuation=True,\n",
|
||||
" enable_spoken_emojis=True\n",
|
||||
" ),\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"loader = GoogleSpeechToTextLoader(\n",
|
||||
" project_id=project_id,\n",
|
||||
" location=location,\n",
|
||||
" recognizer_id=recognizer_id,\n",
|
||||
" file_path=file_path,\n",
|
||||
" config=config\n",
|
||||
")\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.0"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
@@ -89,10 +89,28 @@ See a [usage example and authorizing instructions](/docs/integrations/document_l
|
||||
from langchain.document_loaders import GoogleDriveLoader
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
### Google Vertex AI Vector Search
|
||||
### Speech-to-Text
|
||||
|
||||
> [Google Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/matching-engine/overview),
|
||||
> [Google Cloud Speech-to-Text](https://cloud.google.com/speech-to-text) is an audio transcription API powered by Google's speech recognition models.
|
||||
|
||||
This document loader transcribes audio files and outputs the text results as Documents.
|
||||
|
||||
First, we need to install the python package.
|
||||
|
||||
```bash
|
||||
pip install google-cloud-speech
|
||||
```
|
||||
|
||||
See a [usage example and authorizing instructions](/docs/integrations/document_loaders/google_speech_to_text).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GoogleSpeechToTextLoader
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
### Vertex AI Vector Search
|
||||
|
||||
> [Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/matching-engine/overview),
|
||||
> formerly known as Vertex AI Matching Engine, provides the industry's leading high-scale
|
||||
> low latency vector database. These vector databases are commonly
|
||||
> referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service.
|
||||
|
Reference in New Issue
Block a user