rparallel-processingpolygonspatialspdep

parallelizing function poly2nb {spdep}


The documentation for spdep::poly2nb contains the following entry under Arguments:

foundInBox: default NULL using R code, possibly parallelised if a snow cluster is available, otherwise a list of length (n-1) with integer vectors of candidate neighbours (j > i), or NULL if all candidates were (j < i) (as created by the poly_findInBoxGEOS function in rgeos for clean polygons)

I have interpreted the part in bold as the function will be parallelized if this argument is NULL (default) and a snow cluster is registered. I have tried doing it like this:

cl <- parallel::makeCluster(7)
doParallel::registerDoParallel(cl)

spdep::poly2nb(squamate_dist) # squamate_dist is a large SpatialPolygonDataFrame

Looking at Task Manager doesn't show any parallelization. What is the correct way to run this function in parallel? Also, is there a way to parallelize it while supplying a list to argument foundInBox?


Solution

  • The spdep package (1.1-8) uses functions like spdep::set.mcOption to set up parallel computations. See the example in ?spdep::set.mcOption on how they do it.

    I can't confirm that this works for spdep::poly2nb but it worked for me when using spdep::skater or spdep::nbcosts.

    In a function I use it like this:

    function_using_spdep <- function(...) {
      nc <- 4L # number of cores
      cores_opt <- set.coresOption(nc)
      mc_opt <- set.mcOption(FALSE)
      cl <- parallel::makeCluster(get.coresOption())
      set.ClusterOption(cl)
      on.exit({
        set.coresOption(cores_opt)
        set.mcOption(mc_opt)
        set.ClusterOption(NULL)
        parallel::stopCluster(cl)
      })
    
      # do spdep stuff
    }