Why does h2o.r2() not match manually computed R^2?

I'm using h2o.r2(), but it gives me a very different value from what I'm computing manually... It doesn't seem to ALWAYS have this behavior... e.g. for simple linear models it seems to work.

Anyways I'm not sure if I am using it wrong somehow or if this is a bug?

library(tidyverse)
# fit AML, get leader...
fit_aml_H2O = function(X_df, Y_vec) {
  library(h2o); h2o.init()
  
  df = cbind(X_df, Y_vec)
  colnames(df)[[ncol(df)]]='Y'
  df = as.h2o(df)
  aml <- h2o.automl(x=colnames(X_df), y='Y', training_frame=df, nfolds=0,
                    max_models = 15, max_runtime_secs=90)
  leader = h2o.get_best_model(aml)
  cat('R^2: ', h2o.r2(leader, train=T))
  return(leader)
}

# manually compute R^2 (verified to work)
R2 = function(Y_pred, Y_true) {
  MSE = mean((as.numeric(Y_true)-as.numeric(Y_pred))**2)
  R2 = 1-MSE/var(Y_true)
  return(R2)
}

data(iris)

X_df = iris %>% select(-Petal.Length)
Y = iris %>% select(Petal.Length)
model = fit_aml_H2O(X_df, Y)

X_df = as.h2o(X_df)
Y_pred = h2o.predict(model, newdata=X_df)$predict
(R2_train = R2(Y_pred[,1], Y[,1]))
cat('R^2 (manually computed): ', R2_train, '\n')
cat('R^2 (reported by H2O): ', h2o.r2(model,train=T), '\n')
cat('difference between manual R^2 & H2O reported R^2: ',
    abs(R2_train-h2o.r2(model,train=T)), '\n')
stopifnot(all.equal(h2o.r2(model,train=T), R2_train))

Solution

There's a bug in the manual R² computation - the first value of the R's numeric vector gets broadcasted with the operation across the H2O column.

To see what's happening I tried the following:

> Y_true <- Y[,1]
> Y_pred <- Y_pred[,1]
> (as.numeric(Y_true)-as.numeric(Y_pred))
       predict
1 -0.085572158
2 -0.090655887
3  0.090492763
4  0.127857837
5  0.005002167
6 -0.335313060

[150 rows x 1 column] 
> (as.numeric(Y_true)-as.numeric(Y_pred))+as.numeric(Y_pred) 
  predict
1     1.4
2     1.4
3     1.4
4     1.4
5     1.4
6     1.4

[150 rows x 1 column] 
> Y_true
  [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4 1.5 1.2 1.3 1.4
 [39] 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7 4.5 4.9 4.0 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1 4.5 3.9 4.8 4.0 4.9 4.7 4.3 4.4
 [77] 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 5.1 5.3 5.5 5.0
[115] 5.1 5.3 5.5 6.7 6.9 5.0 5.7 4.9 6.7 4.9 5.7 6.0 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9 5.7 5.2 5.0 5.2 5.4 5.1

You can either convert it to R by changing

MSE = mean((as.numeric(Y_true)-as.numeric(Y_pred))**2)

MSE = mean((as.numeric(as.vector(Y_true))-as.numeric(as.vector(Y_pred)))**2)

Or convert it to H2O by changing

MSE = mean((as.numeric(Y_true)-as.numeric(Y_pred))**2)

MSE = mean((as.numeric(as.h2o(Y_true))-as.numeric(as.h2o(Y_pred)))**2)

NOTE: Even with this you might notice that the manually calculated R² and H2O's R² differ. I believe that this is caused by train/test split that is used in AutoML when nfolds=0 (e.g. for early stopping). (difference between manual R^2 & H2O reported R^2: 0.0005075975).

NOTE 2: The as.numeric is probably unnecessary but I kept it there just in case you need it for your data.