Overall goal: I am trying to run Circuitscape using a habitat suitability map.
I've already filtered the data to a greater than 0.80 suitability so I currently have a large SpatRaster
(21373 rows and 42746 columns) with 0 and 1 cells. I need to assign patch IDs to any touching cells (8-directions) so that I can run my connectivity model!
My current problem, I'm using the terra function patches()
, but it has currently been running for 3 days and has not finished and I have no way (that I know of) to track the progress.
Right now I'm trying to be patient and let the patches function run in case its just taking quite some time.
I also tried the clump()
function from the raster
package but it has the same time issue.
Is this something I just have to be patient for or is there another program/package that can speed up this process? I have 5 rasters to find the patches for.
I have access to ArcGIS Pro but prefer to stay in R if possible.
Here is an example of how you might estimate how long it would take to patch data of that size on given hardware. Note that, I made some sort of assumption when simulating an example raster that 10% of the values are not NA; the patches function seems to be clearly O(n^2); I predict that on my hardware running patches would take between 7 and 8 days.
one likely reasonable way forward would seem to be to reduce the resolution of the raster. You can use a model like mdl developed here to estimate how long a smaller size would take. and find an appropriate balance.
Note that running the simulations to get the estimate itself took me 16minutes. if I had done one less larger datapoint, it would have completed closer to 1 minute. you can see the polynomial difference right there.
target_rows <- 21373
target_cols <- 42746
row_frac <- target_rows/(target_cols+target_rows)
library(tidyverse)
library(terra)
set.seed(42)
# i did up to 13 - reduce to 12 if you arent as patient and want a less accurate estimate :)
(cols <- 2^(2:13))
(rows <- (2^(2:13)/2))
list_of_rasters <- map2_df(cols,rows,
\(co,ro){
mult <- ro*co
cat("\nMaking size\t", mult , "\trows:\t", ro,"\tcols:\t",co)
rb_time <- system.time({
r <- rast(nrows=ro, ncols=co, xmin=0)
to_pop <- sample.int(n = mult,
size = mult / 10,
replace = FALSE)
r[to_pop] <- 1L
})
memsize <- capture.output(pryr::object_size(wrap(r)))
cat("\nMemory use\t",memsize)
cat("\nElapsed Seconds making the raster:\t",rb_time[["elapsed"]])
tibble(
raster=list(r),
rows=ro,
cols=co,
memsize_on_disk=memsize,
elapsed = rb_time[["elapsed"]])
})
list_of_rasters
patches_list <- map(list_of_rasters$raster,
\(r){
tm_seconds <- system.time({ pr <- patches(r)})
cat("\nElapsed Seconds making the patches:\t",tm_seconds[["elapsed"]])
tibble(pr=list(pr),tm_seconds=tm_seconds[["elapsed"]])
})
patches_list <- list_rbind(patches_list)
(all_info <- bind_cols(list_of_rasters,
patches_list))
(mdl <- lm(tm_seconds~poly(rows*cols,degree=2),data=all_info))
summary(mdl)
want_df <- data.frame(
rows=target_rows,
cols=target_cols
)
want_df$tm_seconds <- predict(mdl,newdata=want_df)
(all_info2 <- bind_rows(all_info,
want_df))
rows_x_cols <- all_info2$rows*all_info2$cols
plot(rows_x_cols,
all_info2$tm_seconds)
xtc <- seq(from=0,to=target_cols,length.out=500)
xtr <- seq(from=0,to=target_rows,length.out=500)
xl <- xtc*xtr
lines(x=xl,
y=predict(mdl,
newdata=list(
rows=xtr,
cols=xtc
)))
seconds_to_days <- function(x){
round(x/ (60*60*24),4)
}
# how many days ?
seconds_to_days(all_info2$tm_seconds)