rparallel-processingsimulationr-future

Troubleshooting 'row names discarded' warning in parallel simulation output in R using the 'future' package


I'm running a simulation using the future and future.apply packages in R where I need to execute multiple iterations of a function in parallel and bind the results together. When I use more than one iteration (n_iter), I encounter a warning message indicating that row names found from a short variable are being discarded.

In data.frame(..., check.names = FALSE) : row names were found from a short variable and have been discarded

Here's my minimal example that reproduces the warning:


# install.packages("future")
library(future)
# install.packages("future.apply")
library(future.apply)
# you need both packages in order to use "future"

# parallel::detectCores()
# check how many cores you have

options(parallelly.fork.enable = TRUE)
# You have to set this every time you start a new R session


# data generating function
data_generating_function <- function(n, mean, sd) {
  x <- rnorm(n, mean, sd)
  y <- rnorm(n, mean, sd) + 1 * x
  
  return(data.frame(x, y))
}

# data_generating_function(10, 0, 1)


# one simulation
one_simulation <- function(n, mean, sd) {
  data <- data_generating_function(n, mean, sd)
  
  model <- lm(y ~ x, data = data)
  
  p_value <- summary(model)$coefficients[2, 4]
  
  return(p_value)
}

# one_simulation(10, 0, 1)

# grid with simulation parameters
sim_grid <- expand.grid(
  n = c(10, 100, 1000),
  mean = c(0, 1, 2),
  sd = c(1, 2, 3)
)

# number of iterations
# n_iter <- 1 # no problems when runing only one iteration
n_iter <- 10

plan(multicore, workers = 3)
# choose the number of workers here

#this part produces the warnings:
res_simulation <- do.call("rbind", lapply(seq_len(nrow(sim_grid)), function(rowindex)
{
  print(rowindex)
  cbind(sim_grid[rowindex, ], do.call(
    "rbind",
    future_lapply(seq_len(n_iter), function(iter)
      
    {
      
      one_simulation(
        n = sim_grid$n[rowindex],
        mean = sim_grid$mean[rowindex],
        sd = sim_grid$sd[rowindex]
      )
      
    }, future.seed = 12457854 + rowindex # !!! you have to set a new seed for each row, other wise you will have the same results for each row!!!
    )
  ))
}))

When I run just one iteration (setting n_iter to 1), the warning message does not appear, and the results are as expected. However, when I increase n_iter to 10 for multiple iterations, then the warning arises.

I suspect this has something to do with the rbind function. I belive I have to drop the rwonames at some point but I cant figure it out. Any ideas?


Solution

  • The warning you're getting is from cbind when you cbind(sim_grid..., do.call(rbind, .... When you have multiple functions / things happening in your code, it's best to try and break it up in small parts when troubleshooting. See below where I run just rowindex 1:

    ## sanity check
      rowindex <- 1
      print(rowindex)
     
      cbind( # warning when trying to combine the two lines below occurs here
        sim_grid[rowindex, ],  # running this line only, no warning
        do.call("rbind", future_lapply(seq_len(n_iter), function(iter) # running this line only, no warning
        {
          one_simulation(
            n = sim_grid$n[rowindex],
            mean = sim_grid$mean[rowindex],
            sd = sim_grid$sd[rowindex]
          )
        }, future.seed = 12457854 + rowindex # !!! you have to set a new seed for each row, other wise you will have the same results for each row!!!
        )
        )
      )
    

    To fix this error, add row.names = NULL like so:

     ## sanity check
      rowindex <- 1
      print(rowindex)
      
      cbind(
        sim_grid[rowindex, ],
        do.call("rbind", future_lapply(seq_len(n_iter), function(iter)
        {
          one_simulation(
            n = sim_grid$n[rowindex],
            mean = sim_grid$mean[rowindex],
            sd = sim_grid$sd[rowindex]
          )
        }, future.seed = 12457854 + rowindex # !!! you have to set a new seed for each row, other wise you will have the same results for each row!!!
        )
        ),
        row.names = NULL
      )
    

    see more about the warning here:

    cbind warnings : row names were found from a short variable and have been discarded