scikit-learnpytorch

Discrepancy between sklearn's r2_score() and PyTorch's MSELoss()?


I am not sure if I am missing something very basic but I have started to notice some slight discrepancies between R2 scores returned by sklearn's r2_score() function and R2 scores calculated from PyTorch's MSELoss() (with the additional help of statistics.variance()).

The R2 score returned by the sklearn method is consistently (slightly) lower than the one returned via MSELoss().

Here is some basic code to reproduce the difference:

from sklearn.metrics import r2_score
from torch.nn import MSELoss
import statistics 
import random
import torch
import numpy as np 

actuals = random.sample(range(1, 50), 40)

preds = []

for value in actuals:
    pred = value * 0.70
    preds.append(pred)

loss = MSELoss()

mse = loss(torch.tensor(preds), torch.tensor(actuals))

r2 = 1 - mse / statistics.variance(actuals)

score = r2_score(actuals, preds)

print(f'R2 Score using (PyTorch) MSELoss: {r2}')
print(f'R2 Score using (sklearn) r2_score: {score}')

Example output:

R2 Score using (PyTorch) MSELoss: 0.6261289715766907
R2 Score using (sklearn) r2_score: 0.6165425269729996

I figured this could be related to the fact that MSELoss() takes tensors as input (but sklearn doesn't) but I don't really know why or how.

Versions:


Solution

  • This is due to Bessel's correction. You can achieve the same results using pvariance instead of variance.

    from sklearn.metrics import r2_score
    from torch.nn import MSELoss
    import statistics 
    import random
    import torch
    import numpy as np 
    
    actuals = random.sample(range(1, 50), 40)
    
    preds = []
    
    for value in actuals:
        pred = value * 0.70
        preds.append(pred)
    
    loss = MSELoss()
    
    mse = loss(torch.tensor(preds), torch.tensor(actuals))
    
    r2_sample = 1 - mse / statistics.variance(actuals)
    r2_population = 1 - mse / statistics.pvariance(actuals)
    
    score = r2_score(actuals, preds)
    
    print(f'R2 Score using (PyTorch) MSELoss (sample): {r2_sample}')
    print(f'R2 Score using (PyTorch) MSELoss (population): {r2_population}')
    print(f'R2 Score using (sklearn) r2_score: {score}')
    

    Output:

    > R2 Score using (PyTorch) MSELoss (sample): 0.6582530736923218
    > R2 Score using (PyTorch) MSELoss (population): 0.6494903564453125
    > R2 Score using (sklearn) r2_score: 0.6494903644913157