pythonmultilingualspeech-to-textwhisper

How do I transcribe a multi language audio file using Whisper, without translating any of the content?


I am attempting to transcribe an audio file using the Whisper library which contains alternating English and Indonesian speech.

Some of the Indonesian speech is correctly transcribed into Indonesian text, but some of it is translated into English and transcribed.

This behaviour seems to be random, different passes with the same model and different models give different results.

Is there any way to only transcribe and not translate?

Setting the language to Indonesian causes everything to be translated to Indonesian. Setting it to English causes the behaviour I described.


Solution

  • You could use WhisperX and leverage its speaker diarization.