web-audio-apipitch-tracking

How can I utilize the full spectrogram resolution when dealing with speech input in the Web Audio API?


I am developing an online visual Chinese tone helper kind-of-thing. This involves doing pitch detection with the HPS Algorithm. But this algorithm's performance is restricted by the resolution of the incoming spectrogram. I have been using the Analyser node until now but since I can't set the sample rate of the audiocontext I get a unnecessarily high highest-frequency (samplerate/2 = approx 24 kHz while human speech only goes up to about 3.4 kHz) in the spectrogram. So if I have a spectrogram resolution of 1024 (since the largest fft-size allowed by the web audio api is 2048) I only utilize a small part of the dynamic range when analysing my speech input.

To solve this I have also been trying to use a scriptProcessorNode to gather a buffer that I analyse with the FFT found in DSP.js to gain more control, but that seems like an approach that is much worse performance-wise compared to using the analyser-node. Does anybody have any suggestion on how to solve this issue? More about my setup can be seen in the development blog.


Solution

  • You could maybe try implementing the FFT code in asm.js. I suspect you'd see pretty significant performance improvements in both Chrome and Firefox, since this is exactly the kind of thing asm is really good at.

    Ultimately, I think you're going to have to profile this. Is it better to implement the FFT yourself with only the bins you want, or should you use an AnalyserNode with a super high-resolution and just throw away what you don't need? The answer will be in the measurements.

    That being said, even a pretty sub-optimal FFT implementation should still be plenty fast for real-time analysis of a single input. I'd be kind of surprised if you ran into any show-stopper performance problems.