rlinear-regressionrasterterra

Returning multiple outputs from a raster calculation using either raster or terra


I'm running linear fit models on a series of very large rasters. I often need to return two or more values from the model - for example, the coefficient and its associated p-value. In a non-raster case, I'd make a function that returned all the values I wanted as a list, like so:

#quick lm function to pull out coefficient and p.value
lm_fun <- function(x, y){
  lm <- lm(x ~ y)
  coef1 <- coef(summary(lm))[2,1]
  pval <- coef(summary(lm))[2,4]
  lm_results <- list(coef1 = coef1, pval = pval)
}

#sample data
dat <- c(runif(100, 2, 5), runif(100, 4, 10), runif(100, 6, 14))
timestep <- 1:length(dat)

lm_results <- lm_fun(dat, timestep)

However, when working with raster data a custom function is applied to the raster through either calc (with the raster package), or app (with the terra package), and returns a raster or SpatRaster layer. Is there anyway to modify the function to produce a raster stack, with different outputs (e.g. coefficient, p value) in different layers? Right now, I'm using the below approach, but it means I'm running the same calculation multiple times to pull different parts of the results, which is extremely computationally inefficient. I'm open to using either raster or terra.

# create three identical RasterLayer objects
r1 <- r2 <- r3 <- raster(nrow=100, ncol=100)
# Assign random cell values
values(r1) <- runif(ncell(r1), min=2, max=5)
values(r2) <- runif(ncell(r2), min=4, max=10) 
values(r3) <- runif(ncell(r3), min=6, max=14)
# combine three RasterLayer objects into a RasterStack
s <- stack(r1, r2, r3)
plot(s)

#calculate number of timesteps
nsteps <- dim(s)[3]
timestep <- 1:nsteps

#functions to calculate linear trend and associated p-value
trendfun <- function(x) { if (is.na(x[1])){ NA } else { lm(x ~ timestep)$coefficients[2]}}
pfun <- function(x) { if (is.na(x[1])){ NA } else { summary(lm(x ~ timestep))$coefficients[2,4]}}

# calculate trend and p-value using raster
library(raster)
s_trend <- calc(s, trendfun)
s_trend_pv <- calc(s, pfun)

#alternatively, calculate using terra package
library(terra)
s_rast <- rast(s)
s_trend <- app(s_rast, fun=trendfun)
s_pvalue <- app(s_rast, fun=pfun)

Solution

  • You can use terra::app. The function should return a vector or a matrix (with a row for each cell). Make sure that the when catching missing data, that the number of NAs returned must be the same as the number of numbers returned when there is data. Here is an illustration.

    set.seed(123)
    r <- terra::rast(nrow=10, ncol=10, nlyr=3, vals=runif(300))
    timestep <- 1:nlyr(r)
    
    f <- function(d) {
        if (any(is.na(d))) {
            c(NA, NA)
        } else {
            m <- lm(d ~ timestep)
            cf <- summary(m)$coefficients
            cf[2, c(1,4)] 
        }
    }
    
    terra::app(r, f)
    #class       : SpatRaster 
    #dimensions  : 10, 10, 2  (nrow, ncol, nlyr)
    #resolution  : 36, 18  (x, y)
    #extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
    #coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
    #source(s)   : memory
    #names       :   Estimate,    Pr(>|t|) 
    #min values  : -0.4600143, 0.005280976 
    #max values  :  0.4173450, 0.990162160 
    

    Also see terra::regress

    terra::regress(r, 1:nlyr(r))
    #class       : SpatRaster 
    #dimensions  : 10, 10, 2  (nrow, ncol, nlyr)
    #resolution  : 36, 18  (x, y)
    #extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
    #coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
    #source(s)   : memory
    #names       : (Intercept),          x 
    #min values  :  -0.3435626, -0.4600143 
    #max values  :   1.4499354,  0.4173450