python-3.xlibrosaopuspydubaudiosegment

pydub.AudioSegment messes up the audio data when loading from a numpy.ndarray


I have to mix audio files programmatically (add background noises to recordings) and all of my files are 8-9 hour long recordings in the .opus format.

I have tried to use pydub.AudioSegment to load one in the memory but I get this following error:

path_to_input = '/path/to/my/input/file.opus'
sound_data = AudioSegment.from_file(path_to_input)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/audio_segment.py", line 728, in from_file
    fix_wav_headers(p_out)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/audio_segment.py", line 142, in fix_wav_headers
    raise CouldntDecodeError("Unable to process >4GB files")
pydub.exceptions.CouldntDecodeError: Unable to process >4GB files

So apparently I cannot use pydub.AudioSegment to load my files because they are too big (the file I am trying to open is actually 48MB on disk so I guess they are too big to load into memory for pydub?). Anyway, I have managed to use librosa to load the file.

sample_rate = 8000
sound_data_librosa = librosa.load(path_to_input, sr=sample_rate, res_type='kaiser_best')
sound_data_librosa = sound_data_librosa[0]

And I thought I could overcome this problem by creating a pydub.audio_segment.AudioSegment object from sound_data_librosa (which is a numpy.ndarray).

sound_data = AudioSegment(
    sound_data_librosa.tobytes(),
    frame_rate=sample_rate,
    sample_width=sound_data_librosa.dtype.itemsize,
    channels=1
    )

Which seems to work fine, but when I write it back on disk it sounds like random noise.

path_to_output = '/path/to/my/output/file.opus'
sound_data.export(path_to_output,
                      format="opus")

So I haven't modified anything yet, but somehow I lose all my audio data. I cannot understand what exactly the problem is. Is there anything that I am doing wrong that I can fix so that I don't distort the audio data?

Also, I don't necessarily have to do it like this, it's just that I have been using pydub.AudioSegment to modify my audio files (apply gain, overlay, write on disk as .opus files) so that's why I need to load them as a pydub.audio_segment.AudioSegment object. If there is another way to do the same things in python, I would appreciate if you can point it out. My main concern is the lack of support for .opus files (both reading and writing) that's why I'm trying to stick with pydub.


Solution

  • Yeah there are some oddities with using opus and ffmpeg on MacOS. I tried to fix the file size issue for a while, but with no progress. Instead, what I suggest is to convert the files to wav (or any other okay-ish (: format) and try to use it. I've tried converting the 42MB opus file to WAV, the result is around 2.7GB, but somehow pydub was able to read it. You can use something like this to convert the files

    import subprocess
    
    def convert_to_wav(name):
         command = ['ffmpeg', '-i', f'{name}.opus', f'{name}.wav']
         subprocess.run(command, stdout=subprocess.PIPE, stdin=subprocess.PIPE)
    

    or alternatively ping the owner of pydub on GitHub