[SOLVED] parallelizing function poly2nb {spdep}

parallelizing function poly2nb {spdep}

The documentation for spdep::poly2nb contains the following entry under Arguments:

foundInBox: default NULL using R code, possibly parallelised if a snow cluster is available, otherwise a list of length (n-1) with integer vectors of candidate neighbours (j > i), or NULL if all candidates were (j < i) (as created by the poly_findInBoxGEOS function in rgeos for clean polygons)

I have interpreted the part in bold as the function will be parallelized if this argument is NULL (default) and a snow cluster is registered. I have tried doing it like this:

cl <- parallel::makeCluster(7)
doParallel::registerDoParallel(cl)

spdep::poly2nb(squamate_dist) # squamate_dist is a large SpatialPolygonDataFrame

Looking at Task Manager doesn't show any parallelization. What is the correct way to run this function in parallel? Also, is there a way to parallelize it while supplying a list to argument foundInBox?

Solution

The spdep package (1.1-8) uses functions like spdep::set.mcOption to set up parallel computations. See the example in ?spdep::set.mcOption on how they do it.

I can't confirm that this works for spdep::poly2nb but it worked for me when using spdep::skater or spdep::nbcosts.

In a function I use it like this:

function_using_spdep <- function(...) {
  nc <- 4L # number of cores
  cores_opt <- set.coresOption(nc)
  mc_opt <- set.mcOption(FALSE)
  cl <- parallel::makeCluster(get.coresOption())
  set.ClusterOption(cl)
  on.exit({
    set.coresOption(cores_opt)
    set.mcOption(mc_opt)
    set.ClusterOption(NULL)
    parallel::stopCluster(cl)
  })

  # do spdep stuff
}