[SOLVED] How can I utilize the full spectrogram resolution when dealing with speech input in the Web Audio API?

How can I utilize the full spectrogram resolution when dealing with speech input in the Web Audio API?

I am developing an online visual Chinese tone helper kind-of-thing. This involves doing pitch detection with the HPS Algorithm. But this algorithm's performance is restricted by the resolution of the incoming spectrogram. I have been using the Analyser node until now but since I can't set the sample rate of the audiocontext I get a unnecessarily high highest-frequency (samplerate/2 = approx 24 kHz while human speech only goes up to about 3.4 kHz) in the spectrogram. So if I have a spectrogram resolution of 1024 (since the largest fft-size allowed by the web audio api is 2048) I only utilize a small part of the dynamic range when analysing my speech input.

To solve this I have also been trying to use a scriptProcessorNode to gather a buffer that I analyse with the FFT found in DSP.js to gain more control, but that seems like an approach that is much worse performance-wise compared to using the analyser-node. Does anybody have any suggestion on how to solve this issue? More about my setup can be seen in the development blog.

Solution

You could maybe try implementing the FFT code in asm.js. I suspect you'd see pretty significant performance improvements in both Chrome and Firefox, since this is exactly the kind of thing asm is really good at.

Ultimately, I think you're going to have to profile this. Is it better to implement the FFT yourself with only the bins you want, or should you use an AnalyserNode with a super high-resolution and just throw away what you don't need? The answer will be in the measurements.

That being said, even a pretty sub-optimal FFT implementation should still be plenty fast for real-time analysis of a single input. I'd be kind of surprised if you ran into any show-stopper performance problems.