I am coding in python and I am correlating a row in pandas (index 2500) with a sinusoidal function that I defined (sine_modulation). When I print the value I obtain by using
row_correlation(saved_data_DAQ.iloc[2500].values, sine_modulation(time_measurement,modulation_frequency_axion))
where row_correlation(f,g) is just defined as np.corrcoef(f, g)[0, 1]
I obtain 0.23. However, if I plot both functions I can visually see an extemely high degree of correlation (see image). This is expected because the blue curve is just random white noise (from a gaussian distribution) plus a constant times the sine modulation itself (blue = noise + C*red where C=0.002)
I would like to know why the correlation computed by this function is so low, but more importantly, do you have any idea or suggestion on how to compute a correlation that better reflects the high degree of correlation between my two functions?
You can also see zoom-in below
NOTE that it may as well be that the correlation is right and it is 0.23, then my question would be the following: what other quantity could I compute to show wether my noise has an oscillation component or not? I saw the word "synchronization" on the comments, maybe this is the right quantity to compute?
I did a short example for you to see where you are getting the low R values. Let's consider a pure positive sine:
N = 2500 # number of samples
t = np.linspace(0,1, N) # time going to 1 seconds I guess
Fs = N/t[-1] # sampling rate
sine = (np.sin(4*np.pi*t-np.pi/2)+1)/2 # positive sine wave
Since you did not add your code,I assumed your noise looks something like this:
noise = abs(np.random.normal(0,0.1,len(t))) # random
Finally, let's define the coefficient you are multiplying the sine wave with. Let's set it as going from 0.001 to 1 in a linear space with 100 samples:
C = np.linspace(0.001, 1, 100) # pure sine coefficient
If we loop through those values and we generate the noisy signal with sineWithNoise = c*sine + noise
, we get the following results:
To know the actual value of c
, look at the xlabel of the third subplot (the right most axes).
Most importantly, I think you need to see the scatter plot, as the calculation of the correlation coefficient relies on comparing the two signals against each other (source for image):
and not comparing the two signals in time against each other (source for image):
To use cross-correlation, you can use:
from scipy.signal import correlate, correlation_lags
xcorr = correlate(sine, sineWithNoise) # generated sineWithNoise = 0.2*sine + noise
lags = correlation_lags(N,N)/Fs # get lags in seconds
plt.figure()
plt.plot(lags, xcorr)
plt.grid()
plt.xlabel("Lags (~s)") # xlabel
plt.ylabel("Cross-correlation") # ylabel
plt.axvline(0) # perfect scenario peak without any shift
To get the following results:
To get how well synchronised they are, you need to see if the maximum is indeed without any shift:
idxMax = np.argmax(xcorr) # get arg of maximum
print(lags[idxMax]) # print corresponding lag
# 0.0008, almost zero
Hope this helps you