rparallel-processingterradoparallel

Calculating NDVI in parallel using `lapp()` within `foreach()`


I am attempting to calculate the normalized difference vegetation index (NDVI) for a large directory of National Aerial Imagery Program (NAIP) imagery. I want to run it in parallel to speed up the calculation but I need to calculate it in a memory safe way. Reading other StackOverflow questions related to this topic, here are the morsels of wisdom I took away:

  1. On windows machines, foreach() allows for looping in parallel
  2. lapp() is the memory safe way to do raster algebra on multiple bands
  3. spatRasters are non-serializable and need to be passed through wrap() to be run in parallel

In the function to calculate NDVI I have tried two approaches: First, I tried reading in the raster and passing it to wrap(), but it just produces a NULL value, but doesn't throw an error.

Second I removed the wrap() and got this error:

Error in x@ptr$nrow() : external pointer is not valid

Here is what I have attempted so far.

##Loading Necessary Packages##
library(terra)
library(doParallel)
library(foreach)

##Creating Fake NAIP tiles##
inpath<-file.path(tempdir(), "NAIP")

if(!dir.exists(inpath)){
  dir.create(inpath)
}

NAIP_1<-rast(xmin=45.5,xmax=45.55, ymin=-122.5, ymax=-122.45, nrows=50, ncols=50, crs="EPSG:4326", nlyrs=4)
NAIP_2<-rast(xmin=45.5,xmax=45.55, ymin=-122.45, ymax=-122.4, nrows=50, ncols=50, crs="EPSG:4326", nlyrs=4)

values(NAIP_1)<-sample(c(1:255), 10000, replace=TRUE)
values(NAIP_2)<-sample(c(1:255), 10000, replace=TRUE)

writeRaster(NAIP_1, filename = file.path(inpath, "FAKE_NAIP1.tif"))
writeRaster(NAIP_2, filename = file.path(inpath, "FAKE_NAIP2.tif"))

tmp<-list.files(inpath, pattern="*.tif$", full.names = TRUE)

##Creating output directory##
outpath<-file.path(tempdir(), "NDVI")

if(!dir.exists(outpath)){
  dir.create(outpath)
}

##Creating the NDVI function##
NDVI_calc<- function(X, outpath) {
  r<-terra::rast(X)
#r<-terra::wrap(terra::rast(X)) #the other options I tried
  f<-basename(X)
  o<-paste(outpath, f, sep="/")
  if(!file.exists(o)){
    NDVI_int<-function(n,r){
      ndvi<-as.integer(10000*(n-r)/ (n+r))
      return(ndvi)
    }
    terra::lapp(r[[c(4,3)]], fun=NDVI_int, filename=o,  wopt=list(datatype="INT4S"), overwrite=TRUE)
  }
}

##Setting up clusters for parallel processing##
ncores<-detectCores()-2
cl<-makeCluster(ncores)
registerDoParallel(cl)

##Running the function in parallel##
foreach(i = tmp, outpath=outpath) %dopar% {
  NDVI_calc(i, outpath=outpath)
}

##Close the cluster connections
stopCluster(cl) 

Solution

  • You can make it work like this:

    NDVI_calc <- function(X, outpath) {
      f <- basename(X)
      out <- paste(outpath, f, sep="/")
      if (!file.exists(out)){
        r <- terra::rast(X)
        NDVI_int <- function(n, r){
           ndvi <- as.integer(10000*(n-r) / (n+r))
           return(ndvi)
        }
        terra::lapp(r[[c(4,3)]], fun=NDVI_int, filename=out,  wopt=list(datatype="INT4S"), overwrite=TRUE)
      } 
      return(out)
    }
    
    ncores<-detectCores()-2
    cl <-makeCluster(ncores)
    registerDoParallel(cl)
    
    out <- foreach(i = tmp) %dopar% {
      NDVI_calc(i, outpath=outpath)
    }
    
    stopCluster(cl) 
    
    str(out)
    #List of 2
    # $ : chr "Rtmpw9tcyb/NDVI/FAKE_NAIP1.tif"
    # $ : chr "Rtmpw9tcyb/NDVI/FAKE_NAIP2.tif"
    

    That is, do not pass "outpath" to foreach and return the filenames, not the SpatRasters.