From 747efa16ec7bbfda1331dee0a64dc3bef53dfefc Mon Sep 17 00:00:00 2001
From: qonnop <43803874+qonnop@users.noreply.github.com>
Date: Sat, 15 Mar 2025 03:22:29 +0100
Subject: [PATCH] community: fix CPU support for FasterWhisperParser (implicit
 compute type for WhisperModel) (#30263)

FasterWhisperParser fails on a machine without an NVIDIA GPU: "Requested
float16 compute type, but the target device or backend do not support
efficient float16 computation." This problem arises because the
WhisperModel is called with compute_type="float16", which works only for
NVIDIA GPU.

According to the [CTranslate2
docs](https://opennmt.net/CTranslate2/quantization.html#bit-floating-points-float16)
float16 is supported only on NVIDIA GPUs. Removing the compute_type
parameter solves the problem for CPUs. According to the [CTranslate2
docs](https://opennmt.net/CTranslate2/quantization.html#quantize-on-model-loading)
setting compute_type to "default" (standard when omitting the parameter)
uses the original compute type of the model or performs implicit
conversion for the specific computation device (GPU or CPU). I suggest
to remove compute_type="float16".

@hulitaitai you are the original author of the FasterWhisperParser - is
there a reason for setting the parameter to float16?

Thanks for reviewing the PR!

Co-authored-by: qonnop <qonnop@users.noreply.github.com>
---
 .../langchain_community/document_loaders/parsers/audio.py     | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libs/community/langchain_community/document_loaders/parsers/audio.py b/libs/community/langchain_community/document_loaders/parsers/audio.py
index bdcf72b96b9..8a873f4b3a4 100644
--- a/libs/community/langchain_community/document_loaders/parsers/audio.py
+++ b/libs/community/langchain_community/document_loaders/parsers/audio.py
@@ -673,9 +673,7 @@ class FasterWhisperParser(BaseBlobParser):
         file_obj = io.BytesIO(audio.export(format="mp3").read())
 
         # Transcribe
-        model = WhisperModel(
-            self.model_size, device=self.device, compute_type="float16"
-        )
+        model = WhisperModel(self.model_size, device=self.device)
 
         segments, info = model.transcribe(file_obj, beam_size=5)