I used to process a LIDAR catalog with the following code (using the LAScatalog processing engine from the great lidR
package):
library(lidR)
lasdir <- "D:\\LAS\\"
output <- "D:\\LAS\\PRODUCTS\\"
epsg = "+init=epsg:25829"
res = 1
no_cores <- detectCores()
cat <- lascatalog(lasdir = lasdir,
outputdir = output,
pattern = '*COL.laz$|*COL.LAZ$',
catname = "Catalog",
clipcat = FALSE, clipcatbuf = FALSE, clipbuf = 1000, clipcatshape = clipcatshape,
cat_chunk_buffer = 20,
cores = no_cores, progress = TRUE,
laz_compression = TRUE, epsg = epsg,
retilecatalog = FALSE, tile_chunk_buffer = 10,
tile_chunk_size = 1000,
filterask = FALSE,
filter = "-keep_first -drop_z_below 2")
DEM_output <- paste0(output,"DEM_", str_pad(res, 3, "left", pad = "0"), "/")
opt_output_files(cat) <- paste0(DEM_output,"{ORIGINALFILENAME}") #set filepaths
DEM <- grid_terrain(cat, res = res, algorithm = "knnidw"(k = 5, p = 2))
There was some actualization of the library and now, the parameters cores
seems not working and although the process works, now it does not work in parallel. A message states that: Option no longer supported. See ?lidR-parallelism
.
How can I process a catalog in parallel now?
Since lidR 2.1.0
(July 2019) the opt_core()
function has been deprecated. See the changelog.
The strategy used to process the tiles in parallel must now be explicitly declared by users. This is anyway how it should have been designed from the beginning! For users, restoring the exact former behavior implies only one change.
In versions < 2.1.0 the following was correct:
library(lidR) ctg <- catalog("folder/") opt_cores(ctg) <- 4L hmean <- grid_metrics(ctg, mean(Z))
In versions >= 2.1.0 this must be explicitly declared with the future package:
library(lidR) library(future) plan(multisession) ctg <- catalog("folder/") hmean <- grid_metrics(ctg, mean(Z))
Also this is fully documented in the manual page named lidR-parallelism
.
?lidR::`lidR-parallelism`
chunk-based parallelism
When processing a LAScatalog, the internal engine splits the dataset into chunks and each chunk is read and processed sequentially in a loop. But actually this loop can be parallelized with the future package. By defaut the chunks are processed sequentially, but they can be processed in parallel by registering an evaluation strategy. For example, the following code is evaluated sequentially:
ctg <- readLAScatalog("folder/") out <- grid_metrics(ctg, mean(Z))
But this one is evaluated in parallel with two cores:
library(future) plan(multisession, workers = 2L) ctg <- readLAScatalog("folder/") out <- grid_metrics(ctg, mean(Z))
With chunk-based parallelism any algorithm can be parallelized by processing several subsets of a dataset [...]
To fully take advantage to this new syntax you need to learn how future
works. See future.