rparallel-processingsnowfall

R Snowfall Environments issues


I am trying to get my head around the Snowfall library and its usage.

Having writing a simulation that makes use of environments, I encountered the following issue. If I source a file to load functions within the parallel mode, the function seems to use a different environment than when I declare the function within parallel mode direclty.

To make things a little bit more clear, lets consider the following two scripts:

q_func.R declares the function

foo.bar <- function(x, envname) assign("val", x, envir = get(envname))
# assigns the value x to the variable "val" in the environment envname

q_snowfall.R main function that uses snowfall

library(snowfall)
SnowFunc <- function(envname) {
    # load the functions

    # Option 1 not working
    source("q_func.R")
    # Option 2 working...
    # foo.bar <- function(x, envname) assign("val", x, envir = get(envname))


    # create the new environment
    assign(envname, new.env())

    # use the function as declared in q_func.R 
    # to assign random numbers to the new env
    foo.bar(x = rnorm(1), envname = envname)

    # return the environment including the random values
    return(get("val", envir = get(envname)))
}

sfInit(parallel = TRUE, cpus = 2)
# create environment 'a' and 'b' that each will get a new variable 
# called 'val' that gets assigned a random value

envs <- c("a", "b")
result <- sfClusterApplyLB(envs, SnowFunc)
sfStop()

If I execute the script "q_snowfall.R" I get the error

Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: object 'a' not found

However, if I use the second option (declaring the function within the SnowFunc-function the error disappears.

Do you know how Snowfall handles the different environments? Or do you even have a solution for the issue. (note that 'q_func.R' actually takes some 100 lines of code, therefore I would prefer to have it in a separate file, thus the "keep option 2" is not a solution!)

Thank you very much!

Edit If I change all get(envname) to get(envname, envir = globalenv()) it seems to work. But it seems to me that this is more or less a workaround and not a very snowfall-like solution.


Solution

  • I think the issue is not with snowfall but with the fact that you're passing the environment by name (as character). You don't need to change all occurences of get, and having it look in globalEnv may indeed be unsafe.

    It is sufficient to change the get call in foo.bar to look in parent.frame() instead (i.e., the environment from which foo.bar was called). The following worked on my machine.

    new q_func.R

    foo.bar <- function(x, envname) assign("val", x, envir=get(envname,
                                    pos=parent.frame()))
    

    (not so) new q_snowfall.R

    library(snowfall)
    SnowFunc <- function(envname) {
    
        assign(envname, new.env())
        foo.bar(x = rnorm(1), envname = envname)
    
        return(get("val", envir = get(envname)))
    }
    
    source("q_func.R")
    sfInit(parallel = TRUE, cpus = 2)
    sfExport("foo.bar")
    
    envs <- c("a", "b")
    result <- sfClusterApplyLB(envs, SnowFunc)
    sfStop()
    

    Note also that I source'd before starting the cluster and used sfExport to export foo.bar to each node.