[SOLVED] What exactly happens to large amplitude values when we load a .wav file using librosa?

What exactly happens to large amplitude values when we load a .wav file using librosa?

I want to understand what happens to large amplitude values of a .wav file, when I load them using librosa.

I was trying to understand the amplitude values of .wav files when I see the waveform using librosa. Now, I want to see how scaling these values of amplitude affects the sound. Hence, I multiplied the values with a scaling factor. However, when I played that using IPython.display.Audio, I was not able to see any effect on the sound:

scaled_signal = signal * 10 # signal is the original sample

# play the scaled signal
print('Play the scaled sample:')
display(Audio(data = scaled_signal, rate = sr))

So I saved the file to my PC and I could hear the difference. The amplitude was indeed scaled. Then, I decided to reload this file using librosa. Surprisingly, now when I played this file again in my jupyter-notebook, I was able to hear the effect of scaling:

soundfile.write('scaled_signal.wav', scaled_signal, sr)

# loading the scaled signal again
scaled_signal, sr = librosa.load('scaled_signal.wav', sr = sr)

print('The scaled sample loaded again')
display(Audio(data = scaled_signal, rate = sr))

However, on plotting the waveform (see below) I could see that its shape has changed. Help me understand what happened and why? It appears as if it applied an upper_bound on magnitude of amplitudes.

fig, axs = plt.subplots(1, 2)
fig.set_figwidth(18)

waveshow(signal, sr = sr, ax = axs[0])
waveshow(scaled_signal, sr = sr, ax = axs[1])

The waveform of original signal and the scaled signal

Solution

librosa.load() does not apply any data-dependent normalization/scaling. Only mapping between int16/32 formats to a 0.0-1.0 range.

From the documentation for IPython.display.Audio, which you are using to play back the audio:

If the array option is used the waveform will be normalized.