After I tried every solution I found online, I must ask here.
I want to achieve the behavior of matlab's corr
function:
I have 2 matrices A and B.
A's shape: (200, 30000)
B's shape: (200, 1)
in matlab, corr(A, B)
will return a matrix with size (30000, 1).
when I use numpy.corrcoef
(or dask
for better performance) I get a (30001, 30001) matrix which is extremely huge, and a wrong answer.
I tried using argument rowvar=False
as some answer suggested, but it didnt work as well.
I even tried scipy.spatial.distance.cdist(np.transpose(traces), np.transpose(my_trace), metric='correlation')
which indeed returned a matrix in shape(30000, 1) as expected but the values were differnet then the result in matlab.
I am desperate for a solution for this problem, please help.
Matlab's corr
by default calculates the correlation of columns of A
and B
, while Python's corrcoef
calculates the correlation of rows within an array(if you pass the function two arrays, it seems it will do the same with vertically stacked arrays). If you do not care about the performance and need to find an easy way to do it, you can stack two arrays horizontally and calculate correlation and get the corresponding elements you would like:
correlation = np.corrcoef(np.hstack((B,A)),rowvar=False)[0,1:]
But if you care about performance more than simple codes, you would need to implement the corr
function yourself. (Please comment and I will add it if that is what you are looking for)
UPDATE: If you would like to implement corr
to prevent extra calculations/memory usage, you can calculate correlation using its formula by first normalizing arrays and then multiplying them:
A = (A - A.mean(axis=0))/A.std(axis=0)
B = (B - B.mean(axis=0))/B.std(axis=0)
correlation = (np.dot(B.T, A)/B.shape[0])[0]
output of sample code:
A = np.array([1,2,2,2]).reshape(4,1)
B = np.arange(20).reshape(4,5)
Python: np.corrcoef(np.hstack((A,B)),rowvar=False)[0,1:]
[0.77459667 0.77459667 0.77459667 0.77459667 0.77459667]
Matlab: corr(A,B)
0.7746 0.7746 0.7746 0.7746 0.7746