[SOLVED] How to properly use pitch

How to properly use pitch_shift (librosa)?

I try to use the librosa and pitch_shift from librosa. I recorded some my voice and used this code:

sampling_rate= 44100
y, sr = librosa.load(directory, sr=sampling_rate) # y is a numpy array of the wav file, sr = sample rate

y_shifted = librosa.effects.pitch_shift(y, sr, n_steps=4, bins_per_octave=24)  # shifted by 4 half steps
librosa.output.write_wav(directory, y_shifted, sr=sampling_rate, norm=False)

It works fine - almost.

I hear some noise in my new voice (after pitch_shifting)

Is there something what I need to use?

Without shift:

https://vocaroo.com/i/s1qEEDvzcUHN

With shift (n_steps = 4):

https://vocaroo.com/i/s0cOiC0cFJSB

Solution

Pitch-shifting typically involves an STFT, the shift—usually of a magnitude spectrum along the frequency axis, and then signal reconstruction via the Griffin-Lim-algorithm (Quora-explanation on how Griffin-Lim works).

The problem is that when we shift the magnitude spectrum, we do just that—and ignore the phase! Griffin-Lim tries to find a reasonable solution to find the correct phase when reconstructing the time domain signal, but it's often just that: a reasonable solution, not a perfect one. And that is why you hear this metallic twang. That's the phases of your signal not being quite right (also called "phasiness").

I believe your function call to librosa is perfectly alright. It may just not be the greatest implementation on earth. Give PyRubberband a try. It's based on Rubberband (a C++ library) and has a good reputation.