iosswiftavaudioenginepitch-detection

Is it feasable to use AVAudioEngine to detect pitch in real time?


I'm trying to write a music app where detection of pitch is the core of it all. I've seen solutions to this problem as well as apps on the AppStore. However most of them are pretty dated and I'd like to do this is Swift. I've been looking at AVAudioEngine as a way to do this, but I find the documentation lacking or maybe I haven't been looking hard enough.

What I have found is that I can tap the inputNode bus like this:

self.audioEngine = AVAudioEngine()
self.audioInputNode = self.audioEngine.inputNode!
self.audioInputNode.installTapOnBus(0, bufferSize:256, format: audioInputNode.outputFormatForBus(0), block: {(buffer, time) in
      self.analyzeBuffer(buffer)
})

The bus is tapped 2-3 times per second and the buffer contains more than 16000 floats for each tap. Are these amplitude samples from the microphone?

The docs at least claims it's output from the node: "The buffer parameter is a buffer of audio captured from the output of an AVAudioNode."

Is it possible to use AVAudioEngine to detect pitch in real time or should I go about this another way?


Solution

  • I realize that Hellium3 is really giving me information to what pitch is and if it's a good idea to do these things with Swift.

    My question was originally about if tapping the PCM bus is the way to obtain input signals from the microphone.

    Since asking this question I've done exactly that. Use the data obtained by tapping the PCM bus and analyse the buffer windows.

    It works really well and it was my lack of understanding of what a PCM bus, buffer and sampling frequency is that made me ask the question in the first place.

    Knowing those three makes it easier to see that this is right on.

    Edit: On demand I'll paste my (deprecated) implementation of the PitchDetector.

    class PitchDetector {
      var samplingFrequency: Float
      var harmonicConstant: Float
    
      init(harmonicConstant: Float, samplingFrequency: Float) {
        self.harmonicConstant = harmonicConstant
        self.samplingFrequency = samplingFrequency
      }
    
      //------------------------------------------------------------------------------
      // MARK: Signal processing
      //------------------------------------------------------------------------------
    
      func detectPitch(_ samples: [Float]) -> Pitch? {
        let snac = self.snac(samples)
        let (lags, peaks) = self.findKeyMaxima(snac)
        let (τBest, clarity) = self.findBestPeak(lags, peaks: peaks)
        if τBest > 0 {
          let frequency = self.samplingFrequency / τBest
          if PitchManager.sharedManager.inManageableRange(frequency) {
            return Pitch(measuredFrequency: frequency, clarity: clarity)
          }
        }
    
        return nil
      }
    
      // Returns a Special Normalision of the AutoCorrelation function array for various lags with values between -1 and 1
      private func snac(_ samples: [Float]) -> [Float] {
        let τMax = Int(self.samplingFrequency / PitchManager.sharedManager.noteFrequencies.first!) + 1
        var snac = [Float](repeating: 0.0, count: samples.count)
        let acf = self.acf(samples)
        let norm = self.m(samples)
        for τ in 1 ..< τMax {
          snac[τ] = 2 * acf[τ + acf.count / 2] / norm[τ]
        }
    
        return snac
      }
    
      // Auto correlation function
      private func acf(_ x: [Float]) -> [Float] {
        let resultSize = 2 * x.count - 1
        var result = [Float](repeating: 0, count: resultSize)
        let xPad = repeatElement(Float(0.0), count: x.count - 1)
        let xPadded = xPad + x + xPad
        vDSP_conv(xPadded, 1, x, 1, &result, 1, vDSP_Length(resultSize), vDSP_Length(x.count))
    
        return result
      }
    
      private func m(_ samples: [Float]) -> [Float] {
        var sum: Float = 0.0
        for i in 0 ..< samples.count {
          sum += 2.0 * samples[i] * samples[i]
        }
        var m = [Float](repeating: 0.0, count: samples.count)
        m[0] = sum
        for i in 1 ..< samples.count {
          m[i] = m[i - 1] - samples[i - 1] * samples[i - 1] - samples[samples.count - i - 1] * samples[samples.count - i - 1]
        }
        return m
      }
    
      /**
       * Finds the indices of all key maximum points in data
       */
      private func findKeyMaxima(_ data: [Float]) -> (lags: [Float], peaks: [Float]) {
        var keyMaximaLags: [Float] = []
        var keyMaximaPeaks: [Float] = []
        var newPeakIncoming = false
        var currentBestPeak: Float = 0.0
        var currentBestτ = -1
        for τ in 0 ..< data.count {
          newPeakIncoming = newPeakIncoming || ((data[τ] < 0) && (data[τ + 1] > 0))
          if newPeakIncoming {
            if data[τ] > currentBestPeak {
              currentBestPeak = data[τ]
              currentBestτ = τ
            }
            let zeroCrossing = (data[τ] > 0) && (data[τ + 1] < 0)
            if zeroCrossing {
              let (τEst, peakEst) = self.approximateTruePeak(currentBestτ, data: data)
              keyMaximaLags.append(τEst)
              keyMaximaPeaks.append(peakEst)
              newPeakIncoming = false
              currentBestPeak = 0.0
              currentBestτ = -1
            }
          }
        }
    
        if keyMaximaLags.count <= 1 {
          let unwantedPeakOfLowPitchTone = (keyMaximaLags.count == 1 && data[Int(keyMaximaLags[0])] < data.max()!)
          if unwantedPeakOfLowPitchTone {
            keyMaximaLags.removeAll()
            keyMaximaPeaks.removeAll()
          }
          let (τEst, peakEst) = self.approximateTruePeak(data.index(of: data.max()!)!, data: data)
          keyMaximaLags.append(τEst)
          keyMaximaPeaks.append(peakEst)
        }
    
        return (lags: keyMaximaLags, peaks: keyMaximaPeaks)
      }
    
      /**
       * Approximates the true peak according to https://www.dsprelated.com/freebooks/sasp/Quadratic_Interpolation_Spectral_Peaks.html
       */
      private func approximateTruePeak(_ τ: Int, data: [Float]) -> (τEst: Float, peakEst: Float) {
        let α = data[τ - 1]
        let β = data[τ]
        let γ = data[τ + 1]
        let p = 0.5 * ((α - γ) / (α - 2.0 * β + γ))
        let peakEst = min(1.0, β - 0.25 * (α - γ) * p)
        let τEst = Float(τ) + p
    
        return (τEst, peakEst)
      }
    
      private func findBestPeak(_ lags: [Float], peaks: [Float]) -> (τBest: Float, clarity: Float) {
        let threshold: Float = self.harmonicConstant * peaks.max()!
        for i in 0 ..< peaks.count {
          if peaks[i] > threshold {
            return (τBest: lags[i], clarity: peaks[i])
          }
        }
    
        return (τBest: lags[0], clarity: peaks[0])
      }
    }
    

    All credit to Philip McLeod whose research is used in my implementation above. http://www.cs.otago.ac.nz/research/publications/oucs-2008-03.pdf