langchain/docs/docs/integrations/document_loaders/google_speech_to_text.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google Speech-to-Text Audio Transcripts\n",
    "\n",
    "The `GoogleSpeechToTextLoader` allows to transcribe audio files with the [Google Cloud Speech-to-Text API](https://cloud.google.com/speech-to-text) and loads the transcribed text into documents.\n",
    "\n",
    "To use it, you should have the `google-cloud-speech` python package installed, and a Google Cloud project with the [Speech-to-Text API enabled](https://cloud.google.com/speech-to-text/v2/docs/transcribe-client-libraries#before_you_begin).\n",
    "\n",
    "- [Bringing the power of large models to Google Cloud’s Speech API](https://cloud.google.com/blog/products/ai-machine-learning/bringing-power-large-models-google-clouds-speech-api)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installation & setup\n",
    "\n",
    "First, you need to install the `google-cloud-speech` python package.\n",
    "\n",
    "You can find more info about it on the [Speech-to-Text client libraries](https://cloud.google.com/speech-to-text/v2/docs/libraries) page.\n",
    "\n",
    "Follow the [quickstart guide](https://cloud.google.com/speech-to-text/v2/docs/sync-recognize) in the Google Cloud documentation to create a project and enable the API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install google-cloud-speech"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example\n",
    "\n",
    "The `GoogleSpeechToTextLoader` must include the `project_id` and `file_path` arguments. Audio files can be specified as a Google Cloud Storage URI (`gs://...`) or a local file path.\n",
    "\n",
    "Only synchronous requests are supported by the loader, which has a [limit of 60 seconds or 10MB](https://cloud.google.com/speech-to-text/v2/docs/sync-recognize#:~:text=60%20seconds%20and/or%2010%20MB) per audio file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import GoogleSpeechToTextLoader\n",
    "\n",
    "project_id = \"<PROJECT_ID>\"\n",
    "file_path = \"gs://cloud-samples-data/speech/audio.flac\"\n",
    "# or a local file path: file_path = \"./audio.wav\"\n",
    "\n",
    "loader = GoogleSpeechToTextLoader(project_id=project_id, file_path=file_path)\n",
    "\n",
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: Calling `loader.load()` blocks until the transcription is finished."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The transcribed text is available in the `page_content`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "docs[0].page_content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "\"How old is the Brooklyn Bridge?\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `metadata` contains the full JSON response with more meta information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "docs[0].metadata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```json\n",
    "{\n",
    "  'language_code': 'en-US',\n",
    "  'result_end_offset': datetime.timedelta(seconds=1)\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recognition Config\n",
    "\n",
    "You can specify the `config` argument to use different speech recognition models and enable specific features.\n",
    "\n",
    "Refer to the [Speech-to-Text recognizers documentation](https://cloud.google.com/speech-to-text/v2/docs/recognizers) and the [`RecognizeRequest`](https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.types.RecognizeRequest) API reference for information on how to set a custom configuation.\n",
    "\n",
    "If you don't specify a `config`, the following options will be selected automatically:\n",
    "\n",
    "- Model: [Chirp Universal Speech Model](https://cloud.google.com/speech-to-text/v2/docs/chirp-model)\n",
    "- Language: `en-US`\n",
    "- Audio Encoding: Automatically Detected\n",
    "- Automatic Punctuation: Enabled"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.cloud.speech_v2 import (\n",
    "    AutoDetectDecodingConfig,\n",
    "    RecognitionConfig,\n",
    "    RecognitionFeatures,\n",
    ")\n",
    "from langchain.document_loaders import GoogleSpeechToTextLoader\n",
    "\n",
    "project_id = \"<PROJECT_ID>\"\n",
    "location = \"global\"\n",
    "recognizer_id = \"<RECOGNIZER_ID>\"\n",
    "file_path = \"./audio.wav\"\n",
    "\n",
    "config = RecognitionConfig(\n",
    "    auto_decoding_config=AutoDetectDecodingConfig(),\n",
    "    language_codes=[\"en-US\"],\n",
    "    model=\"long\",\n",
    "    features=RecognitionFeatures(\n",
    "        enable_automatic_punctuation=False,\n",
    "        profanity_filter=True,\n",
    "        enable_spoken_punctuation=True,\n",
    "        enable_spoken_emojis=True,\n",
    "    ),\n",
    ")\n",
    "\n",
    "loader = GoogleSpeechToTextLoader(\n",
    "    project_id=project_id,\n",
    "    location=location,\n",
    "    recognizer_id=recognizer_id,\n",
    "    file_path=file_path,\n",
    "    config=config,\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}