rarcgisterra

Is there an alternative to terra patches in R for large SpatRasters?


Overall goal: I am trying to run Circuitscape using a habitat suitability map.

I've already filtered the data to a greater than 0.80 suitability so I currently have a large SpatRaster (21373 rows and 42746 columns) with 0 and 1 cells. I need to assign patch IDs to any touching cells (8-directions) so that I can run my connectivity model!

My current problem, I'm using the terra function patches(), but it has currently been running for 3 days and has not finished and I have no way (that I know of) to track the progress.

Right now I'm trying to be patient and let the patches function run in case its just taking quite some time.

I also tried the clump() function from the raster package but it has the same time issue.

Is this something I just have to be patient for or is there another program/package that can speed up this process? I have 5 rasters to find the patches for.

I have access to ArcGIS Pro but prefer to stay in R if possible.


Solution

  • Here is an example of how you might estimate how long it would take to patch data of that size on given hardware. Note that, I made some sort of assumption when simulating an example raster that 10% of the values are not NA; the patches function seems to be clearly O(n^2); I predict that on my hardware running patches would take between 7 and 8 days.

    one likely reasonable way forward would seem to be to reduce the resolution of the raster. You can use a model like mdl developed here to estimate how long a smaller size would take. and find an appropriate balance.

    Note that running the simulations to get the estimate itself took me 16minutes. if I had done one less larger datapoint, it would have completed closer to 1 minute. you can see the polynomial difference right there.

    target_rows <- 21373
    target_cols <- 42746
    
    row_frac <- target_rows/(target_cols+target_rows)
    
    library(tidyverse)
    library(terra)
    set.seed(42)
    
    # i did up to 13 - reduce to 12 if you arent as patient and want a less accurate estimate :) 
    (cols <- 2^(2:13))
    (rows <- (2^(2:13)/2))
    
    list_of_rasters <- map2_df(cols,rows,
                               \(co,ro){
                                 mult <- ro*co
                                 cat("\nMaking size\t", mult , "\trows:\t", ro,"\tcols:\t",co)
                                 rb_time <- system.time({
                                   
                                   r <- rast(nrows=ro, ncols=co, xmin=0)
                                   
                                   to_pop <- sample.int(n = mult,
                                                        size = mult / 10,
                                                        replace = FALSE)
                                   r[to_pop] <- 1L
                                   
                                 })
                                 memsize <-  capture.output(pryr::object_size(wrap(r)))
                                 cat("\nMemory use\t",memsize)
                                 cat("\nElapsed Seconds making the raster:\t",rb_time[["elapsed"]])
                                 tibble(
                                   raster=list(r),
                                   rows=ro,
                                   cols=co,
                                   memsize_on_disk=memsize,
                                   elapsed = rb_time[["elapsed"]])
                               })
    
    list_of_rasters
    
    patches_list <- map(list_of_rasters$raster,
                        \(r){
                          tm_seconds <- system.time({ pr <- patches(r)})
                          cat("\nElapsed Seconds making the patches:\t",tm_seconds[["elapsed"]])
                          tibble(pr=list(pr),tm_seconds=tm_seconds[["elapsed"]])
                        })
    patches_list <- list_rbind(patches_list)
    
    (all_info <- bind_cols(list_of_rasters,
                          patches_list))
    
    
    (mdl <- lm(tm_seconds~poly(rows*cols,degree=2),data=all_info))
    summary(mdl)
    
    want_df <- data.frame(
      rows=target_rows,
      cols=target_cols
    )
    want_df$tm_seconds <- predict(mdl,newdata=want_df)
    
    (all_info2 <- bind_rows(all_info,
                           want_df))
    
    rows_x_cols <- all_info2$rows*all_info2$cols
    plot(rows_x_cols,
         all_info2$tm_seconds)
    
    xtc <- seq(from=0,to=target_cols,length.out=500)
    xtr <- seq(from=0,to=target_rows,length.out=500)
    xl <- xtc*xtr
    
    lines(x=xl,
          y=predict(mdl,
                    newdata=list(
            rows=xtr,
            cols=xtc
          )))
    
    seconds_to_days <- function(x){
      round(x/ (60*60*24),4)
    }
    
    # how many days ? 
    seconds_to_days(all_info2$tm_seconds)