Ideally it should be returning values between -1 and 1 for every cell except for the cells that have the same column name and row name those need to have a 1 value
Tried replacing the NaN with 0 before doing corr() and it returns proper values but those values are inaccurate for the purpose of the program
# df
MovieA MovieB MovieC MovieD MovieE
Angee 0.000000 NaN -0.500000 0.500000 NaN
Anirvesh 1.166667 -0.333333 -0.833333 NaN NaN
Jay 1.166667 -0.333333 NaN -0.833333 NaN
Karthik 0.000000 -1.500000 NaN NaN 1.5
Naman NaN 0.250000 NaN -0.250000 NaN
# df.T.corr()
Angee Anirvesh Jay Karthik Naman
Angee 1.0 1.0 -1.0 NaN NaN
Anirvesh 1.0 1.0 1.0 1.0 NaN
Jay -1.0 1.0 1.0 1.0 1.0
Karthik NaN 1.0 1.0 1.0 NaN
Naman NaN NaN 1.0 NaN 1.0
The NaNs are correct, they are returned when you cannot compute the correlation because of NaNs. This happens when you don't have at least two common values.
Filling the NaNs before computation indeed doesn't make sense as this will add fake datapoints that will be used to compute the correlation.
What you could do is fillna
with 0
after the computation if you really don't want NaNs:
out = df.T.corr().fillna(0)
Output:
Angee Anirvesh Jay Karthik Naman
Angee 1.0 1.0 -1.0 0.0 0.0
Anirvesh 1.0 1.0 1.0 1.0 0.0
Jay -1.0 1.0 1.0 1.0 1.0
Karthik 0.0 1.0 1.0 1.0 0.0
Naman 0.0 0.0 1.0 0.0 1.0