I'm performing cross-correlation between a shorter clip of audio (44100 * 14 samples) and a much longer clip of audio (44100 * 60 * 6 samples). From what I understand, I can't window the FFT because of this. When testing out kiss_fftr and kiss_fftri, I found that the inverse operation returned largely noise (but it was still rhythmically similar to the input). I've confirmed that my input audio is correct and the corruption happens solely within this function:
static std::vector<std::vector<float>> do_fft(std::vector<std::vector<float>> song, std::vector<std::vector<float>> loop)
{
loop[0].resize(kiss_fftr_next_fast_size_real(loop[0].size())); // TODO: resize this to song size instead of loop size when done testing
loop[1].resize(loop[0].size()); // TODO: make this dynamic
std::vector<std::vector<kiss_fft_cpx>> fft_loop;
std::vector<std::vector<float>> output;
for (int chan = 0; chan < loop.size(); chan++)
{
fft_loop.push_back(std::vector<kiss_fft_cpx>());
fft_loop[chan].resize(loop[chan].size());
output.push_back(std::vector<float>());
output[chan].resize(loop[chan].size()); // TODO: resize this to song size instead of loop size when done testing
}
kiss_fftr_cfg cfg_loop = kiss_fftr_alloc(loop[0].size(), 0, NULL, NULL);
kiss_fftr(cfg_loop, &loop[0][0], &fft_loop[0][0]);
kiss_fft_free(cfg_loop);
kiss_fftr_cfg cfgi_loop = kiss_fftr_alloc(fft_loop[0].size(), 1, NULL, NULL);
kiss_fftri(cfgi_loop, &fft_loop[0][0], &output[0][0]);
kiss_fft_free(cfgi_loop);
return output;
}
Here's what the output looks like compared to the input: Enlarged to show detail:
If you're wondering about memory, the program is 64-bit and only uses a few gigabytes of ram (just a few gigs, nothing major :P)
Different FFT libraries use different scaling factors, and/or distribute scaling factors differently between their FFT and IFFT implementations.
kiss_fft requires you to scale down by the length of the fft during, or between an fft/ifft pair to get back (approximately within numeric or rounding error) the original time domain input vector.
In your case, that's a fairly large scale factor because the length of your data in large.