I was going through the documentation to understand the Coefficient of Determination and from the document i got an understanding that Coefficient of Determination is nothing but R x R (correlation coefficient)
so i took the housing price dataset from kaggle.com and started to try on it for better understanding, this is my code
took the correlation coefficient
test_data=pd.read_csv(r'\house_price\test.csv')
_d=test_data.loc[:,['MSSubClass','LotFrontage']]
_d.fillna(0,inplace=True)
_d.corr()
now, taking the Coefficient of Determination like this
from sklearn.metrics import r2_score
r2_score(_d['MSSubClass'],_d['LotFrontage'])
for which, i got the value -0.9413195412943647
ideally shouldnt it be 0.060531252961 ? as -0.246031 x -0.246031 = 0.060531252961
following the docs: https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score
Whereer the df.corr
method is (with pearson correlation):
so let's built an example:
x y
1 1
1 0
0 0
1 1
correlation: 4*(1+0+0+1) - 3*2 / sqrt(4*(3-9)*4*(2-4)) = 8-6/ sqr(-24*4*-8) = 2/sqr(-24*4*-8)
wherever R2 is: 1-((0)^2+(1)^2+(0)^2+(0)^2) / (1-0.75)^2+(1-0.75)^2+(0 - 0.75)^2 +(1-0.75)^2
Hope that helps