pythonaudiocomparefftaudio-processing

Python: Compare two audio files which may have noise


For a project purpose, I am recording audio clips(wave files) from different areas near a stage. I need to check if the source audio ie; the audio from the stage is highly audible in the nearby location of the stage using the audio recorded from the nearby places.

More clearly, I have microphones at nearby places of a stage and I have audio clips from stage and these nearby places. How can I check if the sound from the stage is received to the nearby location or how can I understand the sound from the stage is making a disturbance to the nearby places.


Solution

  • Sounds like an interesting project ... to give a nuts and bolts approach since your question could tap into vast fields like perception and convolutional neural networks ... first assure your audio files are aligned in time ... feed a window of audio samples (say 2^12 that is 4096, or more yet always a power of 2) into a FFT call (Discrete Fourier Transform) which will give you an array of frequency bins each with a magnitude (ignore phase) ... then compare this FFT array between your stage mic and each of surrounding mic files ... then repeat above after sliding this window of samples forward in time and repeat until you have visited the full set of samples ... you may want to try above using various widths of this sampling window

    also try various ways to compare the FFT array between the pair of mic signals ... the frequency bins in the FFT array with the greatest magnitudes should be given greater weight in this comparison since you want to avoid allowing noise in low magnitude freq bins to muddy the waters - do this by squaring the freq bin magnitudes to accentuate the dominate freqs and attenuate the quieter freqs ... for simplicity at the start use a sin curve as your audio signal - search for a mobile app : Frequency Sound Generator - you will get a simpler FFT array ... goal here is just that one frequency from your source audio will appear here in the FFT output analysis

    To perform above the only library you really need is the FFT call (Discrete Fourier Transform == DFT), to transform your audio from time domain into the frequency domain, however if you do not have the luxury of time to roll your own code to surround the DFT calls to craft above approach these python repos may speed up your project

    Librosa - Python library for audio and music analysis

    https://librosa.github.io/
    https://github.com/librosa/librosa

    Madmom - Python audio and music signal processing library

    https://madmom.readthedocs.io/en/latest/modules/audio/cepstrogram.html?highlight=mfcc https://madmom.readthedocs.io https://github.com/CPJKU/madmom

    However I suggest you avoid using above libs and just roll your own by making DFT calls. Doing things the hard way teaches you MUCH more and may give you the courage to solve yet harder problems - YMMV