I am currently working on a music visualization LED strip display (similar to this) but got stuck at the process of extracting audio frequency / magnitude from the sample data.
The entire process takes up quite a few steps, and I am not sure at which point I am failing exactly, so please bear with me. I start with reading audio from a TargetDataLine
configured with AudioFormat.
AudioSystem.getTargetDataLine(new AudioFormat(8000.0f, 16, 1, true, true));
Next, I read the data as described in this tutorial. Only difference is that I am using a smaller buffer, effectively reading 64 bytes at a time.
Here is an example data read in this way from a 500Hz test tone.
final byte[] buffer = { // 32 16-bit samples
30, 111, 43, -19, 50, -74, 49, -54,
41, 70, 26, 121, 7, -90, -13, -89,
-31, -118, -44, 17, -51, 75, -50, 60,
-42, -62, -27, -115, -8, 91, 12, 85,
30, 106, 43, -34, 50, -93, 49, -76,
41, 52, 26, 108, 7, -93, -13, -84,
-31, -104, -44, 38, -51, 98, -50, 84,
-42, -42, -27, -98, -8, 99, 12, 83
};
Next, I transform the sampled bytes into an array of double
values in range [-1, 1]. I took inspiration from the aforementioned tutorial and this code.
final double[] samples = new double[buffer.length];
int sampleIndex = 0;
int byteIndex = 0;
while (byteIndex < buffer.length) {
int low = buffer[byteIndex];
++byteIndex;
int high = buffer[byteIndex];
++byteIndex;
int sampleIntValue = ((high << 8) + (low & 0x00ff));
double maxSampleValue = 32768;
samples[2 * sampleIndex] = ((double) sampleIntValue) / maxSampleValue; // Transforming to [-1, 1]
samples[(2 * sampleIndex) + 1] = 0; // Imaginary part
++sampleIndex;
}
Next, I perform a Fast Fourier Transform on the data using the edu.emory.mathcs:JTransforms:2.4 library.
final DoubleFFT_1D fft = new DoubleFFT_1D(samples.length / 2);
fft.complexForward(samples);
Then, I calculate frequencies and magnitudes as described in this code.
final float sampleRate = 8000f;
final Map<Double, Double> frequenciesToMagnitudes = IntStream.range(0, samples.length / 2)
.boxed()
.collect(Collectors.toMap(
index -> 2 * ((double) index / (double) samples.length) * sampleRate,
index -> Math.log10(
Math.sqrt(
Math.pow(samples[2 * index], 2) // real part
+ Math.pow(samples[(2 * index) + 1], 2) // imaginary part
)
)
));
I find the maximum magnitude so I can scale the displayed values accordingly.
final double maximumMagnitude = frequenciesToMagnitudes.values().stream().max(Double::compare).orElse(0d);
And finally, I display the results (taller block represents a brighter LED).
final char[] magnitudeDisplay = {'▁', '▂', '▃', '▄', '▅', '▆', '▇', '█'};
frequenciesToMagnitudes.entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(frequencyToMagnitude -> {
final double magnitudePercentage = frequencyToMagnitude.getValue() / maximumMagnitude;
final int characterToDisplayIndex = Math.min(magnitudeDisplay.length - 1, Math.max(0, (int) (magnitudePercentage * magnitudeDisplay.length)));
final char characterToDisplay = magnitudeDisplay[characterToDisplayIndex];
System.out.print(characterToDisplay);
});
System.out.println();
From all this, I would expect to see two distinct spikes (my frequency + alias region), but instead there is 8.
▁▁▅▁▁▁█▁▁▁▄▁▁▁▅▁▁▁▅▁▁▁▄▁▁▁█▁▁▁▅▁
The number of spikes changes depending on frequency of audio playback (sometimes it's two, sometimes it's four), and the lower the frequency the more spikes I see.
My question is: how do I extract frequency / magnitude pairs from audio data?
After a bit of digging I have found the OpenIMAJ library which does exactly what I need. Here are the differences in my code and their code.
int high = buffer[byteIndex];
++byteIndex;
int low = buffer[byteIndex];
++byteIndex;
final int fftSize = (int) Math.pow(2, 32 - Integer.numberOfLeadingZeros(numberOfSamples - 1));
samples[2 * sampleIndex] = (float) sampleIntValue * Integer.MAX_VALUE / Short.MAX_VALUE;
for (int i = 0; i < samples.length; i += 2) samples[i] /= fftSize;
final Map<Float, Float> frequenciesToMagnitudes = IntStream.range(0, samples.length / 4)
.boxed()
.collect(Collectors.toMap(
index -> (((float) (index * 2)) * (sampleRate / (float) numberOfSamples)) / 2,
index -> {
final float realPart = samples[index * 2];
final float imaginaryPart = samples[index * 2 + 1];
return (float) Math.sqrt(realPart * realPart + imaginaryPart * imaginaryPart);
}
));
After all the changes, the output appears to be much more reasonable:
▁▁▂▁▁▁█▁▁▁▂▁▁▁▂▁
I am still not entirely sure if the frequency calculation is correct, though.