I want to find the offset between two arrays of timestamps. They could represent, let's say, the onset of beeps in two audio tracks.
Note: There may be extra or missing onsets in either track.
I found some information about cross-correlation (e.g. https://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar) which looked promising.
I assumed that each audio track is 10 seconds in duration, and represented the beep onsets as the peaks of a "square wave" with a sample rate of 44.1 kHz:
import numpy as np
rfft = np.fft.rfft
irfft = np.fft.irfft
track_1 = np.array([..., 5.2, 5.5, 7.0, ...])
# The onset in track_2 at 8.0 is "extra," it has no
# corresponding onset in track_1
track_2 = np.array([..., 7.2, 7.45, 8.0, 9.0, ...])
frequency = 44100
num_samples = 10 * frequency
wave_1 = np.zeros(num_samples)
wave_1[(track_1 * frequency).astype(int)] = 1
wave_2 = np.zeros(num_samples)
wave_2[(track_2 * frequency).astype(int)] = 1
xcor = irfft(rfft(wave_1) * np.conj(rfft(wave_2)))
offset = xcor.argmax()
This approach isn't particularly fast, but I was able to get fairly consistent results even with quite low frequencies. However... I have no idea if this is a good idea! Is there a better way to find this offset than cross-correlation?
Edit: added note about missing and extra onsets.
If track_1
and track_2
are timestamps of beeps and both catch all of the beeps, then there's no need to build a waveform and do cross-correlation. Just find the average delay between the two arrays of beep timestamps:
import numpy as np
frequency = 44100
num_samples = 10 * frequency
actual_time_delay = 0.020
timestamp_noise = 0.001
# Assumes track_1 and track_2 are timestamps of beeps, with a delay and noise added
track_1 = np.array([1,2,10000])
track_2 = np.array([1,2,10000]) + actual_time_delay*frequency +
frequency*np.random.uniform(-timestamp_noise,timestamp_noise,len(track_1))
calculated_time_delay = np.mean(track_2 - track_1) / frequency
print('Calculated time delay (s):',calculated_time_delay)
This yields approximately 0.020
s depending on the random errors introduced and is about as fast as it gets.
EDIT: If extra or missing timestamps need to be handled, then the code could be modified as the following. Essentially, if the error on the timestamp randomness is bounded, then perform statistical "mode" function on the delays between all the values. Anything within the timestamp randomness bound is clustered together and identified, then the original identified delays are then averaged.
import numpy as np
from scipy.stats import mode
frequency = 44100
num_samples = 10 * frequency
actual_time_delay = 0.020
# Assumes track_1 and track_2 are timestamps of beeps, with a delay, noise added,
# and extra/missing beeps
timestamp_noise = 0.001
timestamp_noise_threshold = 0.002
track_1 = np.array([1,2,5,10000,100000,200000])
track_2 = np.array([1,2,44,10000,30000]) + actual_time_delay*frequency
track_2 = track_2 + frequency*np.random.uniform(-timestamp_noise,timestamp_noise,len(track_2))
deltas = []
for t1 in track_1:
for t2 in track_2:
deltas.append( t2 - t1)
sorted_deltas = np.sort(deltas)/frequency
truncated_deltas = np.array(sorted_deltas/(timestamp_noise_threshold)+0.5,
dtype=int)*timestamp_noise_threshold
truncated_time_delay = min(mode(truncated_deltas).mode,key=abs)
calculated_time_delay = np.mean(sorted_deltas[truncated_deltas == truncated_time_delay])
print('Calculated time delay (s):',calculated_time_delay)
Obviously, if too many beeps are missing or extra beeps exist then the code begins to fail at some point, but in general it should work well and be way faster than trying to generate a whole waveform and performing correlation.