pythonfftwaveformcross-correlation

Does it make sense to use cross-correlation on arrays of timestamps?


I want to find the offset between two arrays of timestamps. They could represent, let's say, the onset of beeps in two audio tracks.

Note: There may be extra or missing onsets in either track.

I found some information about cross-correlation (e.g. https://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar) which looked promising.

I assumed that each audio track is 10 seconds in duration, and represented the beep onsets as the peaks of a "square wave" with a sample rate of 44.1 kHz:

import numpy as np

rfft = np.fft.rfft
irfft = np.fft.irfft

track_1 = np.array([..., 5.2, 5.5, 7.0, ...])
# The onset in track_2 at 8.0 is "extra," it has no
# corresponding onset in track_1
track_2 = np.array([..., 7.2, 7.45, 8.0, 9.0, ...])
frequency = 44100
num_samples = 10 * frequency
wave_1 = np.zeros(num_samples)
wave_1[(track_1 * frequency).astype(int)] = 1
wave_2 = np.zeros(num_samples)
wave_2[(track_2 * frequency).astype(int)] = 1
xcor = irfft(rfft(wave_1) * np.conj(rfft(wave_2)))
offset = xcor.argmax()

This approach isn't particularly fast, but I was able to get fairly consistent results even with quite low frequencies. However... I have no idea if this is a good idea! Is there a better way to find this offset than cross-correlation?

Edit: added note about missing and extra onsets.


Solution

  • If track_1 and track_2 are timestamps of beeps and both catch all of the beeps, then there's no need to build a waveform and do cross-correlation. Just find the average delay between the two arrays of beep timestamps:

    import numpy as np
    
    frequency = 44100
    num_samples = 10 * frequency
    actual_time_delay = 0.020
    timestamp_noise = 0.001
    
    # Assumes track_1 and track_2 are timestamps of beeps, with a delay and noise added
    track_1 = np.array([1,2,10000])
    track_2 = np.array([1,2,10000]) + actual_time_delay*frequency + 
              frequency*np.random.uniform(-timestamp_noise,timestamp_noise,len(track_1))
    
    calculated_time_delay = np.mean(track_2 - track_1) / frequency
    print('Calculated time delay (s):',calculated_time_delay)
    

    This yields approximately 0.020 s depending on the random errors introduced and is about as fast as it gets.

    EDIT: If extra or missing timestamps need to be handled, then the code could be modified as the following. Essentially, if the error on the timestamp randomness is bounded, then perform statistical "mode" function on the delays between all the values. Anything within the timestamp randomness bound is clustered together and identified, then the original identified delays are then averaged.

    import numpy as np
    from scipy.stats import mode
    
    frequency = 44100
    num_samples = 10 * frequency
    actual_time_delay = 0.020
    
    # Assumes track_1 and track_2 are timestamps of beeps, with a delay, noise added,
    #   and extra/missing beeps
    timestamp_noise = 0.001
    timestamp_noise_threshold = 0.002
    track_1 = np.array([1,2,5,10000,100000,200000])
    track_2 = np.array([1,2,44,10000,30000]) + actual_time_delay*frequency 
    track_2 = track_2 + frequency*np.random.uniform(-timestamp_noise,timestamp_noise,len(track_2))
    
    deltas = []
    for t1 in track_1:
        for t2 in track_2:
            deltas.append( t2 - t1)
    sorted_deltas = np.sort(deltas)/frequency
    truncated_deltas = np.array(sorted_deltas/(timestamp_noise_threshold)+0.5, 
    dtype=int)*timestamp_noise_threshold
    
    truncated_time_delay = min(mode(truncated_deltas).mode,key=abs)
    calculated_time_delay = np.mean(sorted_deltas[truncated_deltas == truncated_time_delay])
    
    print('Calculated time delay (s):',calculated_time_delay)
    

    Obviously, if too many beeps are missing or extra beeps exist then the code begins to fail at some point, but in general it should work well and be way faster than trying to generate a whole waveform and performing correlation.