langchain/docs/docs/integrations/document_loaders/google_speech_to_text.ipynb
2023-10-29 15:50:09 -07:00

207 lines
6.0 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Speech-to-Text Audio Transcripts\n",
"\n",
"The `GoogleSpeechToTextLoader` allows to transcribe audio files with the [Google Cloud Speech-to-Text API](https://cloud.google.com/speech-to-text) and loads the transcribed text into documents.\n",
"\n",
"To use it, you should have the `google-cloud-speech` python package installed, and a Google Cloud project with the [Speech-to-Text API enabled](https://cloud.google.com/speech-to-text/v2/docs/transcribe-client-libraries#before_you_begin).\n",
"\n",
"- [Bringing the power of large models to Google Clouds Speech API](https://cloud.google.com/blog/products/ai-machine-learning/bringing-power-large-models-google-clouds-speech-api)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installation & setup\n",
"\n",
"First, you need to install the `google-cloud-speech` python package.\n",
"\n",
"You can find more info about it on the [Speech-to-Text client libraries](https://cloud.google.com/speech-to-text/v2/docs/libraries) page.\n",
"\n",
"Follow the [quickstart guide](https://cloud.google.com/speech-to-text/v2/docs/sync-recognize) in the Google Cloud documentation to create a project and enable the API."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install google-cloud-speech"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example\n",
"\n",
"The `GoogleSpeechToTextLoader` must include the `project_id` and `file_path` arguments. Audio files can be specified as a Google Cloud Storage URI (`gs://...`) or a local file path.\n",
"\n",
"Only synchronous requests are supported by the loader, which has a [limit of 60 seconds or 10MB](https://cloud.google.com/speech-to-text/v2/docs/sync-recognize#:~:text=60%20seconds%20and/or%2010%20MB) per audio file."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GoogleSpeechToTextLoader\n",
"\n",
"project_id = \"<PROJECT_ID>\"\n",
"file_path = \"gs://cloud-samples-data/speech/audio.flac\"\n",
"# or a local file path: file_path = \"./audio.wav\"\n",
"\n",
"loader = GoogleSpeechToTextLoader(project_id=project_id, file_path=file_path)\n",
"\n",
"docs = loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: Calling `loader.load()` blocks until the transcription is finished."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The transcribed text is available in the `page_content`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"docs[0].page_content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"\"How old is the Brooklyn Bridge?\"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `metadata` contains the full JSON response with more meta information:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"docs[0].metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```json\n",
"{\n",
" 'language_code': 'en-US',\n",
" 'result_end_offset': datetime.timedelta(seconds=1)\n",
"}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Recognition Config\n",
"\n",
"You can specify the `config` argument to use different speech recognition models and enable specific features.\n",
"\n",
"Refer to the [Speech-to-Text recognizers documentation](https://cloud.google.com/speech-to-text/v2/docs/recognizers) and the [`RecognizeRequest`](https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.types.RecognizeRequest) API reference for information on how to set a custom configuation.\n",
"\n",
"If you don't specify a `config`, the following options will be selected automatically:\n",
"\n",
"- Model: [Chirp Universal Speech Model](https://cloud.google.com/speech-to-text/v2/docs/chirp-model)\n",
"- Language: `en-US`\n",
"- Audio Encoding: Automatically Detected\n",
"- Automatic Punctuation: Enabled"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud.speech_v2 import (\n",
" AutoDetectDecodingConfig,\n",
" RecognitionConfig,\n",
" RecognitionFeatures,\n",
")\n",
"from langchain.document_loaders import GoogleSpeechToTextLoader\n",
"\n",
"project_id = \"<PROJECT_ID>\"\n",
"location = \"global\"\n",
"recognizer_id = \"<RECOGNIZER_ID>\"\n",
"file_path = \"./audio.wav\"\n",
"\n",
"config = RecognitionConfig(\n",
" auto_decoding_config=AutoDetectDecodingConfig(),\n",
" language_codes=[\"en-US\"],\n",
" model=\"long\",\n",
" features=RecognitionFeatures(\n",
" enable_automatic_punctuation=False,\n",
" profanity_filter=True,\n",
" enable_spoken_punctuation=True,\n",
" enable_spoken_emojis=True,\n",
" ),\n",
")\n",
"\n",
"loader = GoogleSpeechToTextLoader(\n",
" project_id=project_id,\n",
" location=location,\n",
" recognizer_id=recognizer_id,\n",
" file_path=file_path,\n",
" config=config,\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}