signal-processingfftpitch-trackingpitch-detection

How to perform a Cepstrum for pitch detection


Ok, there are a bunch of questions on this here, and plenty of reading material on google, yet I somehow am not able to figure this out. I want to get the fundamental frequency of a segment of speech. The basic steps are supposed to be:

Here is how I have implemented this in AS3:

var signal:Vector.<Number> = my1024PointSignal; // an audio signal 1024 samples long
var imx:Vector.<Number> = new Vector.<Number>(signal.length); // 1024 point vector to hold imaginary part of fft

hammingWindow(signal); // window it
zeroFill(imx); // fill imx with zeros

FFT(signal, imx); // convert signal into real and imaginary components of fft

toPolar(signal, imx); // convert fft to polar coordinates

// square each bin, and take the log of each bin, discard phase
for (var i:int = 0, l:int = signal.length; i < l; i++) {
    signal[i] = Math.log(Math.pow(signal[i], 2));
    imx[i] = 0;
}

FFT(signal, imx); // or maybe inverseFFT(signal, imx), i don't know

Now when I do this and end by taking the FFT, when I plot it the bins appear to be in reverse order? I also am seeing a larger peak at the second harmonic than at the fundamental. When I do this and take the inverse FFT, I get an audio signal that looks reflected around N/2, and again the peaks seem to be reversed. The whole thing is also quite noisy. What am I doing wrong?


Solution

  • For Cepstrum i always have used to this steps:

    1. Apply hamming windows in the signal (1024 or 2048 points)
    2. Apply FFT
    3. Get magnitude
    4. use just the first half values
    5. Convert to log scale
    6. Apply IFFT
    7. Find the Peak

    The equation for cepstrum:

     IFFT(log(abs(FFT(s))))
    

    Maybe are you are seeing reflected because you not get the step four(4)

    Difference is between ending with an IFFT and ending with an FFT?

    The difference is the scale representation, if do you end using FFT you need extract just the real information, for both below equations you will get the same shape:

    IFFT(log(abs(FFT(s)))) == real(FFT(log(abs(FFT(s)))))
    

    Plot example from cepstrum:

    For IFFT(log(abs(FFT(s)))):

    enter image description here

    For real(FFT(log(abs(FFT(s))))):

    enter image description here

    This is a cepstrum example from a 4096 points sine in 440hz sampled at 44100hz