pythonaudiowavwavemixing

Mixing/Overlaying wav audio files in Python


I have been looking for a solution for overlaying/mixing two WAV audio files together using ONLY the wave library.

I have found the following solution: Mixing two audio files together with python

And one of the answers provide the following code:

import wave

w1 = wave.open("/path/to/wav/1")
w2 = wave.open("/path/to/wav/2")

#get samples formatted as a string.
samples1 = w1.readframes(w1.getnframes())
samples2 = w2.readframes(w2.getnframes())

#takes every 2 bytes and groups them together as 1 sample. ("123456" -> ["12", "34", "56"])
samples1 = [samples1[i:i+2] for i in xrange(0, len(samples1), 2)]
samples2 = [samples2[i:i+2] for i in xrange(0, len(samples2), 2)]

#convert samples from strings to ints
def bin_to_int(bin):
    as_int = 0
    for char in bin[::-1]: #iterate over each char in reverse (because little-endian)
        #get the integer value of char and assign to the lowest byte of as_int, shifting the rest up
        as_int <<= 8
        as_int += ord(char) 
    return as_int

samples1 = [bin_to_int(s) for s in samples1] #['\x04\x08'] -> [0x0804]
samples2 = [bin_to_int(s) for s in samples2]

#average the samples:
samples_avg = [(s1+s2)/2 for (s1, s2) in zip(samples1, samples2)]

The code is written in Python 2 and ord() is depreciated in Python 3 so the code looks like this with ord() removed and double // at samples_avg to avoid creating floats

import wave

w1 = wave.open("/path/to/wav/1")
w2 = wave.open("/path/to/wav/2")

#get samples formatted as a string.
samples1 = w1.readframes(w1.getnframes())
samples2 = w2.readframes(w2.getnframes())

#takes every 2 bytes and groups them together as 1 sample. ("123456" -> ["12", "34", "56"])
samples1 = [samples1[i:i+2] for i in range(0, len(samples1), 2)]
samples2 = [samples2[i:i+2] for i in range(0, len(samples2), 2)]

#convert samples from strings to ints
def bin_to_int(bin):
    as_int = 0
    for char in bin[::-1]: #iterate over each char in reverse (because little-endian)
        #get the integer value of char and assign to the lowest byte of as_int, shifting the rest up
        as_int <<= 8
        as_int += char
    return as_int

samples1 = [bin_to_int(s) for s in samples1] #['\x04\x08'] -> [0x0804]
samples2 = [bin_to_int(s) for s in samples2]

#average the samples:
samples_avg = [(s1+s2)//2 for (s1, s2) in zip(samples1, samples2)]

The code is only partial. What is missing is to revert samples_avg back to a binary string. This is where I have trouble. I have tried the following code to bin(), chr() using the following code

samples_avg = [ chr(s) for s in samples_avg]

samples_avg = [ bin(s) + "'" for s in samples_avg]

and I have tried a million other solutions that I am too embarrassed to post and who all have failed.

Can anyone help finishing this code? I think it would be a really useful code to have out in the community since it only depends on the wave library and can be used in virtual environments.

I am rather new to Python and completely new to audio processing, so I apologize for any stupid questions and mistakes.

Just to clarify what I mean by mixing/overflow. If I have two audio files each with a length of 4 seconds, I want to mix them together to a single audio file with a length of 4 seconds where the two audio files are played simultaneously.


Solution

  • So after a bit of trial an error and help from @Ponkadoodle I got it work. It worked for two recordings I had done on the same computer using quicktime and a online wav-converter. If I used wav-files from the internet the end sample sounded really messed up, I do not know if this is due to frequency etc.

    Here is the final code

    import wave
    import array
    
    
    w1 = wave.open("path/to/file/audiofile1.wav")
    w2 = wave.open("path/to/file/audiofile2.wav")
    
    #get samples formatted as a string.
    samples1 = w1.readframes(w1.getnframes())
    samples2 = w2.readframes(w2.getnframes())
    
    
    #takes every 2 bytes and groups them together as 1 sample. ("123456" -> ["12", "34", "56"])
    samples1 = [samples1[i:i+2] for i in range(0, len(samples1), 2)]
    samples2 = [samples2[i:i+2] for i in range(0, len(samples2), 2)]
    
    #convert samples from strings to ints
    def bin_to_int(bin):
        as_int = 0
        for char in bin[::-1]: #iterate over each char in reverse (because little-endian)
            #get the integer value of char and assign to the lowest byte of as_int, shifting the rest up
            as_int <<= 8
            as_int += char
        return as_int
    
    samples1 = [bin_to_int(s) for s in samples1] #['\x04\x08'] -> [0x0804]
    samples2 = [bin_to_int(s) for s in samples2]
    
    #average the samples:
    samples_avg = [(s1+s2) for (s1, s2) in zip(samples1, samples2)]
    
    samples_array = array.array('i')
    samples_array.fromlist(samples_avg)
    
    wave_out = wave.open ("out.wav", "wb")
    wave_out.setnchannels(1)
    wave_out.setsampwidth(2)
    wave_out.setframerate(w1.getframerate()*4) 
    wave_out.writeframes(samples_array)
    

    I still have an issue with setframerate(). I multiplied it by 4 and it worked, again this might depend on the frequency/framerate etc. of your original recording.

    wave_out.setframerate(w1.getframerate()*4)