I have two signals which I want to compare in terms of similarity. One is smaller (by time) than the other one. If I use correlation to find the highest similarity it tells me that the highest values is at an value where I would'nt expect it.
Could anyone give me a hint if I am just thinking "wrong" or is correlation the wrong tool for that kind of a problem?
My setup:
import numpy
import matplotlib.pyplot as plt
signal_a = numpy.array([10, 20, 10, 30, 20, 10, 28, 22, 10])
signal_b = numpy.array([28, 22])
correlations = numpy.correlate(signal_a, signal_b, mode = "full")
print(correlations)
plt.plot(correlations)
Outputs this chart and correlations array
The highest correlation of [28, 22] is calculated at the position [..., 30, 20, ...]. I understand the formula and why it is 1280. But I am actually looking for [..., 28, 22, ...] as it is exactly (at that case) what I am looking for (Signal B).
Is correlation the right thing to do? I have found so many sources where correlation gets used to detect similarity. Shouldn't the same values be more similar than any other ones?
One possible solution to your problem is Mean Squared Error (MSE). Given two signals a
and b
of same dimensions, MSE is the average value of the element-wise squares of the difference between a
and b
. The code would look like follows (based on this):
import numpy as np
import matplotlib.pyplot as plt
a = np.array([10, 20, 10, 30, 20, 10, 28, 22, 10])
b = np.array([28, 22])
mse = np.ndarray((len(a) - len(b) + 1))
for i in range(mse.size):
mse[i] = np.square(np.subtract(a[i:i+len(b)],b)).mean()
print(mse.argmin())
plt.plot(mse)