javasignal-processingtarsosdsp

Why does PitchDetection work better with whistling?


I am playing around with the UtterAsterisk example program that comes with TarsosDSP. The goal of this program is to show horizontal bars that indicate the note a user should make. A vertical bar moves from left to right to indicate to the user the correct timing of when to perform which notes. The user gets points depending on if the user made the correct note for the correct duration of time.

Link to screenshot of application: https://0110.be/files/photos/392/UtterAsterisk.png

There are 3 sections in this program:

  1. select audio input
  2. select detection algorythm
  3. visual representation of expected notes vs actual notes produced: A little black square is made every X milliseconds that represents the note made by the user. In the title of this section (in the latest version of the program), it says "whistling works best".

I am wondering why does this code work best with whistling?

As background information, I am trying to make a quick prototype for a similar program, but where the user would produce non-whistling, non-vocal (no speech) sounds (like animal sounds) and would need to be matched for correctness.

I have tried whistling the notes indicated on the program and it does work pretty nicely (except for the fact that I'm terrible at whistling!).

I have tried selecting different detection algorythms, but the note that the sound makes doesn't always register in the 3rd section when I do non-whistling sounds.

I have a feeling that whistling creates a single note, whereas making a quacking sound (like a duck) is actually a harmonics (hope I got this right: Several notes mixed to produce a sound).

Line 151, 152: https://github.com/JorenSix/TarsosDSP/blob/master/src/examples/be/tarsos/dsp/example/UtterAsterisk.java

// add a processor, handle percussion event.
dispatcher.addAudioProcessor(new PitchProcessor(algo, sampleRate, bufferSize, this));

The PitchProcessor I believe will only handle a single peak, as it returns a pitchDetectionResult, which contains only a single frequency (line 59): https://github.com/JorenSix/TarsosDSP/blob/master/src/core/be/tarsos/dsp/pitch/PitchDetectionResult.java

Unfortunately, I am mostly beginning in the field of digital signal processing and could use some help to understand how whistling is better in this particular application. If my intuition points to being right (whistling = single note), how could one be able to do the same basic thing that this program does (compare user made sound of animal with a recording for a match)?

Thank you for your input!


Solution

  • It seems likely that the answer is right here.

    where the user would produce non-whistling, non-vocal (no speech) sounds (like animal sounds) and would need to be matched for correctness.

    It seems likely that those "sounds" are the result of multiple tones, where whistling (human whistling) is likely to product a single tone.

    For a comparison, test the difference between the sound of a single note (or key) played on a piano and a chord (multiple notes) played on a piano.

    Another option is using a telephone to produce a dial sound (ex press 7) vs whistling. The telephone produces DTMF (Dual Tone blah blah) sounds.