I'm currently working on a program that analyses a wav file of a solo musician playing an instrument and detects the notes within it. To do this it performs an FFT and then looks at the data produced. The goal is to (at some point) produce the sheet music by writing a midi file.
I just wanted to get a few opinions on what might be difficult about it, whether anyones tried it before, maybe a few things it would be good to research. At the moment my biggest struggle is that not all notes are purely one frequency and I cannot yet detect chords; just single notes. Also there has to be a pause between the notes I am detecting so I know for sure one has ended and the other started. Any comments on this would also be very welcome!
This is the code I use when A new frame comes in from the signal. it looks for the frequency that is most dominant in the sample:
//Get frequency vector for power match
double[] frequencyVectorDoubleArray = Accord.Audio.Tools.GetFrequencyVector(waveSignal.Length, waveSignal.SampleRate);
powerSpectrumDoubleArray[0] = 0; // zero DC
double[,] frequencyPowerDoubleArray = new double[powerSpectrumDoubleArray.Length, 2];
for (int i = 0; i < powerSpectrumDoubleArray.Length; i++)
{
if (frequencyVectorDoubleArray[i] > 15.00)
{
frequencyPowerDoubleArray[i, 0] = frequencyVectorDoubleArray[i];
frequencyPowerDoubleArray[i, 1] = powerSpectrumDoubleArray[i];
}
}
//Method for finding the highest frequency in a sample of frequency domain data
//But I want to filter out stuff
pulsePowerDouble = lowestPowerAcceptedDouble;//0;//lowestPowerAccepted;
int frequencyIndexAtPulseInt = 0;
int oldFrequencyIndexAtPulse = 0;
for (int j = 0; j < frequencyPowerDoubleArray.Length / 2; j++)
{
if (frequencyPowerDoubleArray[j, 1] > pulsePowerDouble)
{
oldPulsePowerDouble = pulsePowerDouble;
pulsePowerDouble = frequencyPowerDoubleArray[j, 1];
oldFrequencyIndexAtPulse = frequencyIndexAtPulseInt;
frequencyIndexAtPulseInt = j;
}
}
foundFreq = frequencyPowerDoubleArray[frequencyIndexAtPulseInt, 0];
1) There is a lot (several decades worth) of research literature on frequency estimation and pitch estimation (which are two different subjects).
2) Peak FFT frequency is not the same as the musical pitch. Some solo musical instruments can produces well over a dozen frequency peaks for just one note, let alone a chord, and with none of the largest peaks anywhere near the musical pitch. For some common instruments, the peaks might not even be mathematically exact harmonics.
3) Using the peak bin of a short unwindowed FFT isn't a great frequency estimator.
4) Note onset detection might require some sophisticated pattern matching, depending on the instrument.