I am trying to downscale climatic conditions using the methodology in this article using the R software. I am almost there, but I am missing a couple of steps
For this example I uploaded some data to the archive.org website to load the required packages and data used in this example use the following code:
library(raster)
library(rgdal)
download.file("https://archive.org/download/Downscaling/BatPatagonia.rds", "Bat.rds")
download.file("https://archive.org/download/Downscaling/TempMinPatNow.rds", "Tmin.rds")
BatPatagonia <- readRDS("Bat.rds")
TempMinPatNow <- readRDS("Tmin.rds")
BatPatagonia is a raster file with the Bathymetry and altitude of the area extracted and transformed from the GEBCO dataset, while the TempMinPatNow is the minimum temperature of the same area for January extracted from WorldClim. The plots of the datasets are seen below:
In order to downscale past data from the last glacial maximum I need to model how current climate would be like if the sea level was the same as it was in the past. In order to do that I use the GEBCO data and to figure out more or less were the coast was. According to the methodology in the article cited above this are the first three steps to follow:
Point 3 is what I have been struggling to do, I will show how I did the first 2 points, and show what I have been looking for trying to solve point 3
In order to do this I took the BatPatagonia raster, and replaced all values below 20 meters with NA values using the following code:
Elev20 <- BatPatagonia
values(Elev20) <- ifelse(values(Elev20) <= 20, NA, values(Elev20))
The resulting raster is shown in the following image
According to the manuscript in page 2591, the next step is to do a Multiple Linear Regression in a moving window using the Following formula for altitudes over 20 meters:
We already have the elevation data, but we also need the rasters for latitude and longitude, for that we use the following code, where we first create the Latitude and Longitude Rasters:
Latitud <- BatPatagonia
Longitud <- BatPatagonia
data_matrix <- raster::xyFromCell(BatPatagonia, 1:ncell(BatPatagonia))
values(Latitud) <- data_matrix[, 2]
values(Longitud) <- data_matrix[, 1]
We will multiply this by a raster mask of the areas that have elevations over 20 meters, so that we get only the values that we need:
Elev20Mask <- BatPatagonia
values(Elev20Mask) <- ifelse(values(Elev20Mask) <= 20, NA, 1)
Longitud <- Elev20Mask*Longitud
Latitud <- Elev20Mask*Latitud
Now I will build a stack with the response variables and the predictor variables:
Preds <- stack(Elev20, Longitud, Latitud, TempMinPatNow)
names(Preds) <- c("Elev", "Longitud", "Latitud", "Tmin")
The resulting stack is shown in the following figure:
As stated in the paper the moving window should be of 25 by 25 cells, resulting in a total of 625 cells, however they state that if the moving window has less than 170 cells with data, the regression should not be performed, and it should have a maximum of 624 cells in order to ensure that we are only modelling areas close to the coast.
The result of this Multiple Regression with the moving window should be a stack with the Local intercept, and the local estimation of each one of the Betas that are in the equation shown above. I found out how to make this using the following code using the getValuesFocal
function (This loop takes a while):
# First we establish the 25 by 25 window
w <- c(25, 25)
# Then we create the empty layers for the resulting stack
intercept <- Preds[[1]]
intercept[] <- NA
elevationEst <- intercept
latitudeEst <- intercept
longitudeEst <- intercept
Now we start the code:
for (rl in 1:nrow(Preds)) {
v <- getValuesFocal(Preds[[1:4]], row = rl, nrows = 1, ngb = w, array = FALSE)
int <- rep(NA, nrow(v[[1]]))
x1 <- rep(NA, nrow(v[[1]]))
x2 <- rep(NA, nrow(v[[1]]))
x3 <- rep(NA, nrow(v[[1]]))
x4 <- rep(NA, nrow(v[[1]]))
for (i in 1:nrow(v[[1]])) {
xy <- na.omit(data.frame(x1 = v[[1]][i, ], x2 = v[[2]][i, ], x3 = v[[3]][i,
], y = v[[4]][i, ]))
if (nrow(xy) > 170 & nrow(xy) <= 624) {
coefs <- coefficients(lm(as.numeric(xy$y) ~ as.numeric(xy$x1) +
as.numeric(xy$x2) + as.numeric(xy$x3)))
int[i] <- coefs[1]
x1[i] <- coefs[2]
x2[i] <- coefs[3]
x3[i] <- coefs[4]
} else {
int[i] <- NA
x1[i] <- NA
x2[i] <- NA
x3[i] <- NA
}
}
intercept[rl, ] <- int
elevationEst[rl, ] <- x1
longitudeEst[rl, ] <- x2
latitudeEst[rl, ] <- x3
message(paste(rl, "of", nrow(Preds), "ready"))
}
Coeffs <- stack(intercept, elevationEst, latitudeEst, longitudeEst, (intercept + Preds$Elev * elevationEst + Preds$Longitud * longitudeEst + Preds$Latitud *latitudeEst), Preds$Tmin)
names(Coeffs) <- c("intercept", "elevationEst", "longitudeEst", "latitudeEst", "fitted", "Observed")
The result of this loop is the coeffs
stack, shown below:
The goal now is to extrapolate the first 4 rasters of the Coeffs stack (intercept, elevationEst, longitudeEst and latitudeEst) to where the coast should be according to the last glacial maximum which was 120 meters shallower
MaxGlacier <- BatPatagonia
values(MaxGlacier) <- ifelse(values(MaxGlacier) < -120, NA,1)
The projected coastline is shown in the following map:
The way the authors projected the coefficients to the coast was by filling the gaps using by solving Poisson's equation using the poisson_grid_fill
of the NCL language from NCAR. But I would like to keep it simple and try to do all in the same language. I also found a similar function in Python.
I would be happy with any extrapolation process that works well, I am not limiting my search to that algorithm.
I found several R packages that fill gaps such as the Gapfill package and even found a review of methods to fill gaps, but most of them are for interpolating and mostly for NDVI layers that can be based on other layers where the gap is filled.
How can I move forward on this?
Thinking back several decades to my physics undergrad days, we used Laplace relaxation to solve these types of Poisson equation problems. I'm not sure, but I guess that may also be how poisson_grid_fill
works. The process is simple. Relaxation is an iterative process where we calculate each cell except those that form the boundary condition as the mean of the cells that are horizontally or vertically adjacent, then repeat until the result approaches a stable solution.
In your case, the cells for which you already have values provide your boundary condition, and we can iterate over the others. Something like this (demonstrated here for the intercept coefficient - you can do the others the same way):
gaps = which(is.na(intercept)[])
intercept.ext = intercept
w=matrix(c(0,0.25,0,0.25,0,0.25,0,0.25,0), nc=3, nr=3)
max.it = 1000
for (i in 1:max.it) intercept.ext[gaps] = focal(intercept.ext, w=w, na.rm=TRUE)[gaps]
intercept.ext = mask(intercept.ext, MaxGlacier)
Edit
Here's the same process embedded in a function, to demonstrate how you might use a while
loop that continues until a desired tolerance is reached (or maximum number of iterations is exceeded). Note that this function is to demonstrate the principle, and is not optimised for speed.
gap.fill = function(r, max.it = 1e4, tol = 1e-2, verbose=FALSE) {
gaps = which(is.na(r)[])
r.filled = r
w = matrix(c(0,0.25,0,0.25,0,0.25,0,0.25,0), nc=3, nr=3)
i = 0
while(i < max.it) {
i = i + 1
new.vals = focal(r.filled, w=w, na.rm=TRUE)[gaps]
max.residual = suppressWarnings(max(abs(r.filled[gaps] - new.vals), na.rm = TRUE))
if (verbose) print(paste('Iteration', i, ': residual = ', max.residual))
r.filled[gaps] = new.vals
if (is.finite(max.residual) & max.residual <= tol) break
}
return(r.filled)
}
intercept.ext = gap.fill(intercept)
intercept.ext = mask(intercept.ext, MaxGlacier)
plot(stack(intercept, intercept.ext))