From 75c3c81b8c3fe534bcdc1ebbc2147b41079156e0 Mon Sep 17 00:00:00 2001
From: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com>
Date: Mon, 19 Aug 2024 18:36:42 +0500
Subject: [PATCH] [Community]: Fix - Open AI Whisper
 `client.audio.transcriptions` returning Text Object which raises error
 (#25271)

- **Description:** The following
[line](https://github.com/langchain-ai/langchain/blob/fd546196ef0fafa4a4cd7bb7ebb1771ef599f372/libs/community/langchain_community/document_loaders/parsers/audio.py#L117)
in `OpenAIWhisperParser` returns a text object for some odd reason
despite the official documentation saying it should return `Transcript`
Instance which should have the text attribute. But for the example given
in the issue and even when I tried running on my own, I was directly
getting the text. The small PR accounts for that.
 - **Issue:** : #25218


I was able to replicate the error even without the GenericLoader as
shown below and the issue was with `OpenAIWhisperParser`

```python
parser = OpenAIWhisperParser(api_key="sk-fxxxxxxxxx",
                                            response_format="srt",
                                            temperature=0)

list(parser.lazy_parse(Blob.from_path('path_to_file.m4a')))
```
---
 .../langchain_community/document_loaders/parsers/audio.py     | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libs/community/langchain_community/document_loaders/parsers/audio.py b/libs/community/langchain_community/document_loaders/parsers/audio.py
index c8fa4c3ed39..9741a32f3de 100644
--- a/libs/community/langchain_community/document_loaders/parsers/audio.py
+++ b/libs/community/langchain_community/document_loaders/parsers/audio.py
@@ -129,7 +129,9 @@ class OpenAIWhisperParser(BaseBlobParser):
                 continue
 
             yield Document(
-                page_content=transcript.text,
+                page_content=transcript.text
+                if not isinstance(transcript, str)
+                else transcript,
                 metadata={"source": blob.source, "chunk": split_number},
             )