I'm working with a few thousand gig-size json files. Rather than manipulating them in my local workspace, I want to push them to child R processes where the gc() problems disappear when the child R session closes. And, perhaps, I can handle two or three asynchronously, allowing me to take advantage of multiple processors.
But I can't get the simple example to work.
myFunction <- function(dataPath, fileId) {
Sys.sleep(10)
paste0(dataPath, fileId)
}
dataPath <- "./"
fileId <- "file01"
filePath <- myFunction(dataPath, fileId)
filePath
filePath <- callr::r(function(dataPath, fileId) myFunction(dataPath, fileId), args = list(dataPath, fileId))
filePath
myFunction(), executed in the Global environment, works fine.
callr() does not find myFunction() in the Global environment, even though it shows in ls() and in the object list window.
myFunction(dataPath, fileId)
:
! could not find function "myFunction"Backtrace:
Subprocess backtrace:
I tried another formulation:
filePath <- callr::r(myFunction(dataPath, fileId), args = list(dataPath, fileId))
filePath
myFunction() does execute in the child R process but fails on return
Error in eval(substitute(expr), data, enclos = parent.frame()) : no("func") || is.function(func) is not TRUE
callr::r
sets up a new session with nothing in it except what you pass in the call. In particular, functions defined in the global environment are not copied there unless you do it explicitly.
So this should work:
filePath <- callr::r(function(dataPath, fileId, fn)
fn(dataPath, fileId),
args = list(dataPath, fileId, myFunction))
The idea is to pass myFunction
as an argument named fn
to the anonymous function that callr::r
executes. callr::r
will serialize it and pass it to the new process.