pythonnumpycorrelationcross-correlation

Cross correlation / similarity of signals - calculate time lag


I have two signals which I want to compare in terms of similarity. One is smaller (by time) than the other one. If I use correlation to find the highest similarity it tells me that the highest values is at an value where I would'nt expect it.

Could anyone give me a hint if I am just thinking "wrong" or is correlation the wrong tool for that kind of a problem?

My setup:

import numpy
import matplotlib.pyplot as plt

signal_a = numpy.array([10, 20, 10, 30, 20, 10, 28, 22, 10])
signal_b = numpy.array([28, 22])
correlations = numpy.correlate(signal_a, signal_b, mode = "full")

print(correlations)
plt.plot(correlations)

Outputs this chart and correlations array

The highest correlation of [28, 22] is calculated at the position [..., 30, 20, ...]. I understand the formula and why it is 1280. But I am actually looking for [..., 28, 22, ...] as it is exactly (at that case) what I am looking for (Signal B).

Is correlation the right thing to do? I have found so many sources where correlation gets used to detect similarity. Shouldn't the same values be more similar than any other ones?


Solution

  • One possible solution to your problem is Mean Squared Error (MSE). Given two signals a and b of same dimensions, MSE is the average value of the element-wise squares of the difference between a and b. The code would look like follows (based on this):

    import numpy as np
    import matplotlib.pyplot as plt
    
    a = np.array([10, 20, 10, 30, 20, 10, 28, 22, 10])
    b = np.array([28, 22])
    mse = np.ndarray((len(a) - len(b) + 1))
    
    for i in range(mse.size):
        mse[i] = np.square(np.subtract(a[i:i+len(b)],b)).mean()
    
    print(mse.argmin())
    plt.plot(mse)