I'm running a simulation using the future
and future.apply
packages in R
where I need to execute multiple iterations of a function in parallel and bind the results together. When I use more than one iteration (n_iter
), I encounter a warning message indicating that row names found from a short variable are being discarded.
In data.frame(..., check.names = FALSE) : row names were found from a short variable and have been discarded
Here's my minimal example that reproduces the warning:
# install.packages("future")
library(future)
# install.packages("future.apply")
library(future.apply)
# you need both packages in order to use "future"
# parallel::detectCores()
# check how many cores you have
options(parallelly.fork.enable = TRUE)
# You have to set this every time you start a new R session
# data generating function
data_generating_function <- function(n, mean, sd) {
x <- rnorm(n, mean, sd)
y <- rnorm(n, mean, sd) + 1 * x
return(data.frame(x, y))
}
# data_generating_function(10, 0, 1)
# one simulation
one_simulation <- function(n, mean, sd) {
data <- data_generating_function(n, mean, sd)
model <- lm(y ~ x, data = data)
p_value <- summary(model)$coefficients[2, 4]
return(p_value)
}
# one_simulation(10, 0, 1)
# grid with simulation parameters
sim_grid <- expand.grid(
n = c(10, 100, 1000),
mean = c(0, 1, 2),
sd = c(1, 2, 3)
)
# number of iterations
# n_iter <- 1 # no problems when runing only one iteration
n_iter <- 10
plan(multicore, workers = 3)
# choose the number of workers here
#this part produces the warnings:
res_simulation <- do.call("rbind", lapply(seq_len(nrow(sim_grid)), function(rowindex)
{
print(rowindex)
cbind(sim_grid[rowindex, ], do.call(
"rbind",
future_lapply(seq_len(n_iter), function(iter)
{
one_simulation(
n = sim_grid$n[rowindex],
mean = sim_grid$mean[rowindex],
sd = sim_grid$sd[rowindex]
)
}, future.seed = 12457854 + rowindex # !!! you have to set a new seed for each row, other wise you will have the same results for each row!!!
)
))
}))
When I run just one iteration (setting n_iter
to 1), the warning message does not appear, and the results are as expected. However, when I increase n_iter to 10 for multiple iterations, then the warning arises.
I suspect this has something to do with the rbind
function. I belive I have to drop the rwonames at some point but I cant figure it out. Any ideas?
The warning you're getting is from cbind
when you cbind(sim_grid..., do.call(rbind, ...
. When you have multiple functions / things happening in your code, it's best to try and break it up in small parts when troubleshooting. See below where I run just rowindex
1:
## sanity check
rowindex <- 1
print(rowindex)
cbind( # warning when trying to combine the two lines below occurs here
sim_grid[rowindex, ], # running this line only, no warning
do.call("rbind", future_lapply(seq_len(n_iter), function(iter) # running this line only, no warning
{
one_simulation(
n = sim_grid$n[rowindex],
mean = sim_grid$mean[rowindex],
sd = sim_grid$sd[rowindex]
)
}, future.seed = 12457854 + rowindex # !!! you have to set a new seed for each row, other wise you will have the same results for each row!!!
)
)
)
To fix this error, add row.names = NULL
like so:
## sanity check
rowindex <- 1
print(rowindex)
cbind(
sim_grid[rowindex, ],
do.call("rbind", future_lapply(seq_len(n_iter), function(iter)
{
one_simulation(
n = sim_grid$n[rowindex],
mean = sim_grid$mean[rowindex],
sd = sim_grid$sd[rowindex]
)
}, future.seed = 12457854 + rowindex # !!! you have to set a new seed for each row, other wise you will have the same results for each row!!!
)
),
row.names = NULL
)
see more about the warning here:
cbind warnings : row names were found from a short variable and have been discarded