I am generating several reports via r markdown. If I do them one by one - everything is okay. If I use %do% - also okay. If I use %dopar% - 3 options:
How to fix that?
Here is code that works fine in 100% of cases:
library(tidyverse)
library(parallel)
library(doParallel)
OutputFolder <- "c:\\temp\\test\\out"
result_foldername <- "Now"
ServersInDB <<- c("server1.ru", "server2.ru")
cores=detectCores(logical = FALSE)
cl <- parallel::makeCluster(cores-1) #not to overload your computer
registerDoParallel(cl)
render_all_obj <- function (MachineName, OutputFolder, result_foldername)
{
library(rmarkdown)
render(input = "c:\\temp\\test\\proj\\Report.RMD",
output_file = paste0(MachineName, ".html"),
output_dir = file.path (OutputFolder, result_foldername ),
params = list(MachineName = MachineName)
)
}
foreach (MachineName = ServersInDB) %do% {
render_all_obj(MachineName, OutputFolder, result_foldername)
}
parallel::stopCluster(cl)
Here is code that fails.
library(tidyverse)
library(parallel)
library(doParallel)
OutputFolder <- "c:\\temp\\test\\out"
result_foldername <- "Now"
ServersInDB <<- c("server1.ru", "server2.ru")
cores=detectCores(logical = FALSE)
cl <- parallel::makeCluster(cores[1]-1) #not to overload your computer
registerDoParallel(cl)
render_all_obj <- function (MachineName, OutputFolder, result_foldername)
{
library(rmarkdown)
render(input = "c:\\temp\\test\\proj\\Report.RMD",
output_file = paste0(MachineName, ".html"),
output_dir = file.path (OutputFolder, result_foldername ),
params = list(MachineName = MachineName)
)
}
foreach (MachineName = ServersInDB) %dopar% {
render_all_obj(MachineName, OutputFolder, result_foldername)
}
parallel::stopCluster(cl)
Here is my rmd:
---
output:
html_document:
toc: true
dev: 'svg'
number_sections: true
toc_depth: 2
toc_float: true
theme: cerulean
toc_collapsed: true
self_contained: true
mathjax: NULL
params:
MachineName: "ServerName" #name of server to analyze
---
```{r , echo=FALSE, include=FALSE, results='hide'}
MachineName <- params$MachineName
```
---
title: "My report is about: `r MachineName`"
---
The problem was - the file with name Report.knit.md. By default it's created in directory specified with parameter input of rmarkdown::render function. Which is same directory for all parallel processes. All processes are trying to perform create, read, write operations with same file.
Workaround was to use intermediates_dir parameter and unique temp directory for every process.
Working solution:
registerDoFuture()
workers <- parallel::detectCores(logical = FALSE) - 1
future::plan(multisession, workers = workers)
ServersInDB <- c("server1.ru", "server2.ru")
render_all_obj <- function (MachineName)
{
OutputFolder <- "c:/temp/test/out"
result_foldername <- "Now"
library(rmarkdown)
tf <- tempfile()
dir.create(tf)
render(input = "c:/temp/test/proj/Report.RMD",
output_file = paste0(MachineName, ".html"),
intermediates_dir=tf,
output_dir = file.path (OutputFolder, result_foldername),
params = list(MachineName = MachineName)
)
unlink(tf)
}
ServersInDB %>% furrr::future_map(render_all_obj)