I want to convert text to speech from a document where multiple languages are included. When I am trying to do the following code, I fetch problems to record each language clearly. How can I save such type mixer text-audio clearly?
from gtts import gTTS
mytext = 'Welcome to gtts! আজ একটি ভাল দিন। tumi kemon acho? ٱلْحَمْدُ لِلَّٰهِ'
language = 'ar' # arabic
myobj = gTTS(text=mytext, tld='co.in', lang=language, slow=False)
myobj.save("audio.mp3")
It's not enough to use just text to speech, since it can work with one language only.
To solve this problem we need to detect language for each part of the sentence.
Then run it through text to speech and append it to our final spoken sentence.
It would be ideal to use some neural network (there are plenty) to do this categorization for You.
Just for a sake of proof of concept I used googletrans
to detect language for each part of the sentences and gtts
to make a mp3 file from it.
It's not bullet proof, especially with arabic text. googletrans
somehow detect different language code, which is not recognized by gtts
. For that reason we have to use code_table to pick proper language code that works with gtts.
Here is working example:
from googletrans import Translator
from gtts import gTTS
input_text = "Welcome to gtts! আজ একটি ভাল দিন। tumi kemon acho? ٱلْحَمْدُ لِلَّٰه"
words = input_text.split(" ")
translator = Translator()
language, sentence = None, ""
lang_code_table = {"sd": "ar"}
with open('output.mp3', 'wb') as ff:
for word in words:
if word == " ":
continue
# Detect language of current word
word_language = translator.detect(word).lang
if word_language == language:
# Same language, append word to the sentence
sentence += " " + word
else:
if language is None:
# No language set yet, initialize and continue
language, sentence = word_language, word
continue
if word.endswith(("?", ".", "!")):
# If word endswith one of the punctuation marks, it should be part of previous sentence
sentence += " " + word
continue
# We have whole previous sentence, translate it into speech and append to mp3 file
gTTS(text=sentence, lang=lang_code_table.get(language, language), slow=False).write_to_fp(ff)
# Continue with other language
language, sentence = word_language, word
if language and sentence:
# Append last detected sentence
gTTS(text=sentence, lang=lang_code_table.get(language, language), slow=False).write_to_fp(ff)
It's obviously not fast and won't fit for longer text.
Also it needs better tokenizer and proper error handling.
Again, it's just proof of concept.