rmlr3nnet

Multiple runs and interaction terms in mlr3 regr.nnet task


I am trying to port a few didactical examples from packages nnet, neuralnet and ranger to package mlr3. I like the way how mlr3 can handle fitted models, e.g. model evaluation, feature importance or hyperparameter optimization, but have still a few problems with model specification.

In the example below, I use an artificial data set, constructed from common density functions and some random noise:

library("dplyr", warn.conflicts = FALSE)
library("nnet")
library("mlr3")
library("mlr3learners")
library("ggplot2")
set.seed(123)

## observation data, functional pattern + some random noise
x <- 1:20
obs <- data.frame(
  x = rep(x, 3),
  f = factor(rep(c("A", "B", "C"), each = 20)),
  y = c(3 * dnorm(x, 10, 3), 5 * dlnorm(x, 2, 0.5), dexp(20-x, .5))
      + rnorm(60, sd = 0.02)
)

It can be fitted with a small nnet with 3 hidden neurons, where I ran a rather large number of trials and then extract the best. It is of course a brute-force method that can lead to overtraining, but that's an intended part of the example.

nns <- lapply(1:100,
         \(foo) nnet(y ~ x * f, data = obs, size = 3, maxit=500, trace=FALSE))
nn <- nns[[which.min(lapply(nns, \(x) x$value))]]
nn
#> a 5-3-1 network with 22 weights
#> inputs: x fB fC x:fB x:fC 
#> output(s): y 
#> options were -

The resulting network is 5-3-1 as expected, with one input for x, 2 levels (= 3 - 1) for the factor f and 2 levels for interactions. Now I want to repeat this with mlr3 and the regr.nnet learner that uses the same basis function.

## workaround: resolve "unsupported feature types: integer"
obs$x <- as.double(obs$x)

## create task and train model
task    <- as_task_regr(obs, target = "y")
learner <- lrn("regr.nnet", size = 3, maxit = 500, trace = FALSE)
learner$train(task)
print(learner$model)
#> a 3-3-1 network with 16 weights
#> inputs: fB fC x 
#> output(s): y 
#> options were - linear output units

We see that the resulting model has only 3 inputs, i.e. none for the interactions. Now we can compare the model with the data and see that the first one is better, because it is based on multiple runs and considers interactions:

pred_grid <- expand.grid(
  x = seq(0, 20, length.out=100),
  f = c("A", "B", "C"))

pred1 <-
  pred_grid |>
  mutate(y = predict(nn, newdata = pred_grid), method = "nnet basic")

pred2 <-
  pred_grid |>
  mutate(y = predict(learner, newdata = pred_grid), method = "nnet mlr")

ggplot(obs, aes(x, y)) + geom_point() +
  geom_line(data = rbind(pred1, pred2), mapping = aes(x, y, color = method)) +
  facet_wrap(~f)

My questions

  1. How can interaction terms be included and
  2. Is there a "best practice" option to train a network multiple times, i.e. without an outer loop. As mlr3+ uses R6 classes, it may be necessary to clone the objects.

Besides this, I wonder if the workaround to convert the x variable to double can be avoided. Furthermore, it would be great if the neuralnet package could also be used as a learner in mlr3.

Created on 2023-03-05 with reprex v2.0.2


Solution

  • Answer 1: (How can interaction terms be included and)

    So the problem that you are facing is that the formula parameter of nnet is not exposed as a hyperparameter. We create the formula for fitting automatically in this line: https://github.com/mlr-org/mlr3learners/blob/856d1d016957c63929b8c8811aad4ad87bd7043a/R/LearnerClassifNnet.R#L71. In this case it is however important to be able to modify this formula as your example shows.

    For that reason I have created a pull request here which addresses this issue.

    Answer 2: (Is there a "best practice" option to train a network multiple times, i.e. without an outer loop. As mlr3+ uses R6 classes, it may be necessary to clone the objects.)

    One way to achieve this would indeed be to clone the learner and use the benchmark() function.

    library(mlr3learners)
    #> Loading required package: mlr3
    
    learner = lrn("regr.nnet")
    
    lgr::get_logger("mlr3")$set_threshold("warn")
    
    set.seed(123)
    x = 1:20
    
    
    
    obs = data.frame(
      x = rep(x, 3),
      f = factor(rep(c("a", "b", "c"), each = 20)),
      y = c(3 * dnorm(x, 10, 3), 5 * dlnorm(x, 2, 0.5), dexp(20 - x, .5)) + rnorm(60, sd = 0.02)
    )
    
    nrow(obs)
    #> [1] 60
    
    x_test = seq(0, 20, length.out = 100)
    test = expand.grid(
      x = x_test,
      f = c("a", "b", "c"),
      y = c(3 * dnorm(x_test, 10, 3), 5 * dlnorm(x_test, 2, 0.5), dexp(20 - x_test, .5)) + rnorm(60, sd = 0.02)
    )
    
    dat = rbind(obs, test)
    
    task = as_task_regr(dat, target = "y")
    resampling = rsmp("custom")
    resampling$instantiate(task, list(train = 1:60), test = list(61:90060))
    
    learners = replicate(100, learner$clone())
    
    design = benchmark_grid(
      tasks = task,
      learners = learners,
      resampling
    )
    
    
    bmr = benchmark(design)
    #> LOG OUTPUT ...
    

    Created on 2023-03-07 with reprex v2.0.2 What is the recommended way depends however also on what exactly you want to do with the results.

    Answer 3 (Besides this, I wonder if the workaround to convert the x variable to double can be avoided. )

    I don't understand this question. Can you elaborate?

    Question 4: (Furthermore, it would be great if the neuralnet package could also be used as a learner in mlr3.)

    You can create an issue here: https://github.com/mlr-org/mlr3extralearners

    Note however, that we will soon properly support the torch package through mlr3torch which will then be the preferred way to train neural networks with mlr3