pythonsignal-processingpitch-trackinglibrosa

Librosa pitch tracking - STFT


I am using this algorithm to detect the pitch of this audio file. As you can hear, it is an E2 note played on a guitar with a bit of noise in the background.

I generated this spectrogram using STFT:spectrogram

And I am using the algorithm linked above like this:

y, sr = librosa.load(filename, sr=40000)
pitches, magnitudes = librosa.core.piptrack(y=y, sr=sr, fmin=75, fmax=1600)

np.set_printoptions(threshold=np.nan)
print pitches[np.nonzero(pitches)]

As a result, I am getting pretty much every possible frequency between my fmin and fmax. What do I have to do with the output of the piptrack method to discover the fundamental frequency of a time frame?

UPDATE

I am still not sure what those 2D array represents, though. Let's say I want to find out how strong is 82Hz in frame 5. I could do that using the STFT function which simply returns a 2D matrix (which was used to plot the spectrogram).

However, piptrack does something additional which could be useful and I don't really understand what. pitches[f, t] contains instantaneous frequency at bin f, time t. Does that mean that, if I want to find the maximum frequency at time frame t, I have to:

  1. Go to the magnitudes[][t] array, find the bin with the maximum magnitude.
  2. Assign the bin to a variable f.
  3. Find pitches[b][t] to find the frequency that belongs to that bin?

Solution

  • Turns out the way to pick the pitch at a certain frame t is simple:

    def detect_pitch(y, sr, t):
      index = magnitudes[:, t].argmax()
      pitch = pitches[index, t]
    
      return pitch
    

    First getting the bin of the strongest frequency by looking at the magnitudes array, and then finding the pitch at pitches[index, t].