pythonaudiospeech-recognitionmicrophonepitch

How to detect pitch using mic as source?


How could I detect pitch using mic as a source? (and have it printed). I've seen some sources that allow for pitch detention through a wav file but I am wondering if there is a way to do so for the former.

Here's the base I'm working with

import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()
with mic as source:
    r.adjust_for_ambient_noise(source, duration=0.3)
    audio = r.listen(source)
    transcript = r.recognize_google(audio)
    print(transcript)

Edit: Specifically, wanting to do a general detection of male/female voices.


Solution

  • aubio has good pitch detection methods and Python bindings. Here's how you could use it:

    import aubio
    import numpy as np
    
    samplerate = 44100
    tolerance = 0.8
    win_s = 4096 // downsample # fft size
    hop_s = 512  // downsample # hop size
    
    pitch_o = pitch("yin", win_s, hop_s, samplerate)
    pitch_o.set_unit("Hz")
    pitch_o.set_tolerance(tolerance)
    
    signal_win = np.array_split(audio, np.arange(hop_s, len(audio), hop_s))
    
    pitch_profile = []
    for frame in signal_win[:-1]:
        pitch = pitch_o(frame)[0]
        if pitch > 0:
            pitch_profile.append(pitch)
    
    if pitch_profile:
        pitch_array = np.array(pitch_profile)
        Q25, Q50, Q75 = np.quantile(pitch_array, [0.25, 0.50, 0.75])
        IQR = Q75 - Q25
        median = np.median(pitch_array)
        pitch_min = pitch_array.min()
        pitch_max = pitch_array.max()
    

    Obviously you'd need to get the audio in an array format. Next thing to observe is that in the presented code I am calculating statistics on the pitch profile. Reason being that the duration is 0.3s, which is much longer than usual number of samples considered for pitch tracking.

    Other examples: