rparallel-processingsnowfall

R Snowfall - Difficulty in implementing functions that call other functions


I am trying to teach myself how to use the Snowfall package, and I have run into the following problem when I try a function that calls a second function (this is a simplified use case of what I ultimately want to implement).

I currently have:

library (snowfall)
f1 <- function(n) { return (n-1) }
f2 <- function(n) { return (f1(n)^2) }
# initialize cluster
sfInit (parallel=TRUE , cpus=4)
# parallel computing
result <- sfLapply(1:10, f2)
# stop cluster
sfStop ()

but I receive the error message:

Error in checkForRemoteErrors(val) :
  4 nodes produced errors; first error: could not find function "f1"

However, if I then run lapply(1:10, f2) I receive the following output:

lapply(1:10, f2)
[[1]]
[1] 0

[[2]]
[1] 1

[[3]]
[1] 4

[[4]]
[1] 9

[[5]]
[1] 16

[[6]]
[1] 25

[[7]]
[1] 36

[[8]]
[1] 49

[[9]]
[1] 64

[[10]]
[1] 81

I ultimately want to use snowfall to implement a parallelized search procedures for multidimensional minimization problems, so will definitely need to be able to call functions from the main parallelized function.

Can anyone help with this?


Solution

  • You need to export the f1 function to the workers using the sfExport function between sfInit and sfLapply:

    sfExport('f1')
    

    This is the snowfall equivalent to the snow clusterExport function.

    To export multiple variables, you can either use multiple arguments or the list argument:

    sfExport('f1', 'x')
    sfExport(list=c('f1', 'x'))
    

    To export all variables in your global environment, use sfExportAll:

    sfExportAll()