algorithmaudioaudio-processingsource-separation

Drum sound recognition algorithms


I am thinking of trying to make program that will automatically generate drum tabs using an audio file containing only the drumming.

I have thought of using FFT to get an average spectrum peaks during a xxxx ms interval and then compare that to a table containing all the drum parts(snare, tombs, base drum and so no) of that specific drum kit and sound gear.

But i have a feeling that it won't be that easy. Have you guys any suggestions on which methods i could use to solve my issue?

// Eric


Solution

  • It isn't easy for anything except a trivial signal. Almost all western 'classical' and commercial music features coincident drum sounds.

    1: Superposition: The original sources add together in a similar manner in the frequency domain as they do in the time domain. Each FFT bin contains contributions from all instruments currently being played (and those which are undamped and still decaying, or resonating sympathetically). Unpicking the various sources is hard - and certainly not a comparison with a library of spectra.

    2: The FFT by its definition windows audio in the time domain and yields magnitude and phase of the basis function in each bin over that window period. The best you could say is that content appeared in the bin corresponding to a drum sound within the window period. If you were to compute a 1024 point FFT, the window duration would be 23ms at 44.1kHz. To put this into a musical perspective, 16th notes at 120bpm are 31.3ms apart. You might get away with smaller FFTs.

    3: Percussion instrument signals tend to look a lot like noise - at least at the point where the instrument is hit. That is to say, there will be energy spread across the spectrum and no obviously dominant frequencies. After impact, tuned percussion starts to look more 'tonal'.

    You probably need to look at a time-domain approach to accurately detect the onset point (onset detection). From there you could look at time or frequency domain characteristics of the signal to try and deduce the instrument in question. There's probably also a lot you could do with a priori knowledge of the genre of music being played, allowing you to predict the patterns that are likely to be present.

    This is a particular case of the more generalised audio source separation problem. There's been lots of academic activity in this area, and consequently a lot of published papers describing approaches. Look for source separation, music information retrieval, audio feature detection