I have a TTS model and I want to combine audio.
I need a way to convert the model output(numpy array) for pydub.AudioSegment to be able to combine audio
This is the model output -
audio[0].data.cpu().numpy() = array([ 1.90522405e-04, 3.96589050e-04, 4.41852462e-04, ...,
1.13033675e-05, -1.63643017e-05, -2.01268449e-05], dtype=float32)
This is my function to combine the audio
from pydub import AudioSegment
from os.path import exists
def creating_one_audio_file(audio):
if exists("/content/audio_file.wav"):
sound2 = AudioSegment.from_wav("/content/audio_file.wav")
combined_sounds = audio + sound2
combined_sounds.export("/content/audio_file.wav", format="wav")
else:
combined_sounds = audio
combined_sounds.export("/content/audio_file.wav", format="wav")
creating_one_audio_file(audio[0].data.cpu().numpy())
You can rely on audiosegment
(a wrapper of a pydub.AudioSegment
) and its audiosegment.from_numpy_array
method or borrow its underlying method implementation from https://github.com/MaxStrange/AudioSegment/blob/master/docs/api/audiosegment.py#L1145