[SOLVED] Neuralnet RMSE is 10x bigger than linear model's RMSE on test data set

Neuralnet RMSE is 10x bigger than linear model's RMSE on test data set

Working with the Boston Housing data set. The linear model is very easy:

library(e1071) # for tune.nnet
library(MASS) # for the Boston housing data set
library(Metrics) # to calculate RMSE
library(nnet)

df <- MASS::Boston
train <- df[1:100, ]
test <- df[101:505, ]
Boston_lm <- lm(medv ~ ., data = train)
lm_rmse <- Metrics::rmse(actual = train$medv, predicted = Boston_lm$fitted.values)
#RMSE = 2.037201

However, the tuned neuralnet returns an RMSE on the test set that is more than 10x higher than the linear model's results:

Boston_tune_nnet <- e1071::tune.nnet(x = train[, 1:ncol(train)-1], y = train$medv, size = 1)
nnet_tune_rmse <- Metrics::rmse(actual = train$medv, predicted = Boston_tune_nnet$best.model$fitted.values)
#RMSE = 22.11024

What's the correct way to build a tuned neuralnet model in this situation?

Solution

You are performing a linear regression, yet in the nnet, you are doing logistic regression(by default) with 1 added layer.

Using the linout = TRUE ie linear regression and also ensuring that we skip any intermediary layers ie No hidden layers, we should get the same results:

library(nnet)
Boston_tune_nnet <- e1071::tune.nnet(x = train[, -ncol(train)],
                                     y = train$medv, size = 0, linout = TRUE,
                                    skip = TRUE)
(nnet_tune_rmse <- Metrics::rmse(actual = train$medv,
        predicted = Boston_tune_nnet$best.model$fitted.values))
#[1] 2.037201

This is what you are looking for

Also note that train[, 1:ncol(train) - 1] is incorrect. You should do train[, 1:(ncol(train) - 1)]