pythonnumpyaudiowavwave

Can I change the amplitude of wav audio file in python from the numpy array?


I'm still starting with audio processing using python and still trying to understand the nature of data and how to process it. I've been stuck trying to increase/decrease the amplitude (volume/loudness if I can say so) of a wav file (without using Pydub Audiosegment) I haven't seen this answered somewhere from what I saw

I extract the audio data using the following code but I don't know what to do next :

import numpy as np
import wave

filename = 'violin.wav'
audiofile = wave.open(filename,'rb')
nch = audiofile.getnchannels()
if nch == 2:
print('Stereo audio file')
elif nch == 1:
print('Mono audio file')
sw = audiofile.getsampwidth()
n_frames = audiofile.getnframes()
fr = audiofile.getframerate()
frames = audiofile.readframes(-1)
typ = { 1: np.int8, 2: np.int16, 4: np.int32 }.get(sw) 
data = np.frombuffer(frames,dtype=typ)

I have tried increasing the values of the data array by certain amount but seems that's not how it works I also was trying to do it using the Fourier Transform but I get stuck reversing the process

How can I change the amplitude from the numpy array? is it necessary to go through Fourier Transform for that?

Thank Youu!


Solution

  • Well works fine with be. The missing piece might be how to convert back to bytes without changing the size (since you did not use unpack, you might have done something wrong when pack. Good news is you can't pack the same way you unpacked)

    import numpy as np
    import wave
    
    filename = 'violin.wav'
    audiofile = wave.open(filename,'rb')
    nch = audiofile.getnchannels()
    if nch == 2:
    print('Stereo audio file')
    elif nch == 1:
    print('Mono audio file')
    sw = audiofile.getsampwidth()
    n_frames = audiofile.getnframes()
    fr = audiofile.getframerate()
    frames = audiofile.readframes(-1)
    typ = { 1: np.int8, 2: np.int16, 4: np.int32 }.get(sw) 
    data = np.frombuffer(frames,dtype=typ)
    # Your code, so far
    p = audiofile.getparams() # Just to get all params
    outfile = wave.open("out.wav", 'wb')
    outfile.setparams(p) # same params
    outfile.writeframes((data*0.5).astype(typ).tobytes())
    outfile.close()
    

    Note that data*0.5 is float type (data//2 would have kept the correct type, but I assume you may want to scale with any scalar value). So, we need to put back to the correct int type. And you already computed it, so it's easy: I just reuse you typ variabe (data*0.5).astype(typ) is the data, scale by 0.5, with the correct type.

    So (data*0.5).astype(type).tobytes() are the bytes.

    Note on adding value

    Note that adding a value does nothing (except, if too big, saturating the file). It's a wave. So it's the frequency, and the amplitude that counts. Not the 0.

    This remark is not computer science. Just basic physics: sound is the variation of atmospheric pressure. When you hit a La (or A) on a piano, the cord vibrate at 440 Hz, and so it creates an variation of air pressure, at 440 Hz. Whether weather is good and pressure is 102000 Pa, and, because of the piano, oscillate between 101999.9 Pa and 102000.1 Pa, or weather is bad, and pressure oscillate between 98999.9 Pa and 99000.1 Pa, or even you are in a plane, listening to a piano, making pressure in your ear oscillate between 69999.9 Pa and 70000.1 Pa, you won't really think the sound is different. It is the oscilation (its frequency, its amplitude) that matters, not the medium value it oscillates around.

    Another way to put it: the direct way to play a sound is to move a membrane following the data in data.

    Imagine you have, connected to your computer (in fact, it is what you have, more or less) an engine able to displace a metal plaque. If you send 0 to that engine, plaque is positioned at 0 mm. If you send 1000, it is positioned at 1mm, 2000 -> 2mm. Etc. Now, if, at a rate of 44100 samples per second (assuming 44100 Hz sampling), you send the data in data, to make the motion of the plaque follows your data, you'll here the sound.

    Now, if you add 1000 to all your data, and play it again. Well, plaque will follows the exact same movement. Only 1mm more to the right. But that is the same movement. It is exactly as if you move yourself 1mm to the left before playing again. Of course, you don't expect any change from that.

    On the other hand, if you multiply all the data by 2, and play again, then motion of the plaque will be the same, but twice as big (a displacement of 1mm is turned into one of 2mm). So, not surprisingly, you hear the same sound. But louder.

    So, long story short: adding a value to sound samples does nothing. Except maybe making the values go out of the possible range.