I am a newbie at R and am having trouble figuring this issue out so any help would be appreciated. I have daily flooding raster data that I am using to calculate the weighted mean of flooding in buffers made around certain locations. I cleaned the rasters in QGIS and am using R to calculate the weighted mean via the following lines:
library(sp)
library(raster)
library(ggplot2)
library(sf)
# Load the raster file into R
flooding_raster_data <- raster(raster_file)
# Print information or perform further processing as needed
print(paste("Loaded raster file:", raster_file))
# Extract values from the raster that fall within the buffer polygons
raster_values_in_buffer_weight <- extract(flooding_raster_data, buffer, fun = mean, na.rm = TRUE, weights=TRUE)
# Create a new buffer object with the extracted floodshare values
buffer_with_mean <- st_as_sf(buffer)
buffer_with_mean$floodshare <- raster_values_in_buffer_weight
# Write the sf object to a shapefile
st_write(buffer_with_mean, output_file_path, append = FALSE)
rm(flooding_raster_data, buffer_with_mean, raster_values_in_buffer_weight, raster_file)
My problem is that for raster where there is no data (coded as no data in QGIS), the buffers have weighted mean 0. I have checked this against zonal mean in QGIS and those same buffers have zonal mean as NULL in QGIS. I am struggling to figure out why this would happen in R as I would think R would make the mean NA or missing so if someone knows why please let me know and how I can fix this. Again, I am a total beginner in R so I may have missed something very obvious.
I tried to make na.rm = FALSE because I thought maybe the issue was with the way I was handling NA values in the R code, but that just made all the weighted mean NULL so I think I am not understanding how R handles missing data.
EDIT
I went to check the documentation for R extract and I took their sample code but changed the raster values to NA and it does give weighted mean as 0 for extract values as polygons if all raster values are NA. Why is it not giving out NA for weighted mean when everything else it gives the answer as NA or NaN. I have added the replication code below.
###############################
# extract values with lines
###############################
r <- raster(ncol=36, nrow=18, vals=NA)
cds1 <- rbind(c(-50,0), c(0,60), c(40,5), c(15,-45), c(-10,-25))
cds2 <- rbind(c(80,20), c(140,60), c(160,0), c(140,-55))
lines <- spLines(cds1, cds2)
extract(r, lines)
###############################
# extract values with polygons
###############################
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
polys <- spPolygons(cds1, cds2)
v <- extract(r, polys)
# mean for each polygon
mean_polyg <- unlist(lapply(v, function(x) if (!is.null(x)) mean(x, na.rm=TRUE) else NA ))
print(mean_polyg)
# weighted mean
v <- extract(r, polys, weights=TRUE, fun=mean, na.rm = TRUE)
print(v)
Since October 2023, when they were deprecated, there are a number of 'quirks' being produced by raster
and sp
. Unless you absolutely need to use them, they are best avoided. I can't say for sure whether this is a result of raster
and sp
being out-of-date, but you can achieve your desired output using the terra
and sf
packages:
library(sf) # 1.0-16
library(terra) # terra 1.7.71
library(dplyr)
# Example sf to represent extent of example raster data and to use as mask
sf_poly <- st_read(system.file("shape/nc.shp", package = "sf")) %>%
slice(1) %>%
st_transform(4326) %>%
select(geometry)
# Get extent of sf_poly
st_bbox(sf_poly)
# xmin ymin xmax ymax
# -81.74091 36.23444 -81.23971 36.58973
# Generate an example raster with -Inf values using st_bbox(sf_poly) values
r <- rast(ncols = 50, nrows = 50, nlyrs = 1,
xmin = -81.74091, xmax = -81.23971,
ymin = 36.23444, ymax = 36.58973,
names = "rast_val",
crs = "EPSG:4326") %>%
init("cell") %>%
crop(sf_poly, mask = TRUE) %>%
ifel(is.na(.), -Inf, .)
# Create example polys sf (note id == 1 is outside extent of non-na/-Inf raster values)
polys <- data.frame(id = rep(1:4, each = 2),
lon = c(-81.7, -81.68, -81.6, -81.55, -81.4, -81.4, -81.3, -81.3),
lat = c(36.25, 36.33, 36.25, 36.40, 36.3, 36.55, 36.25, 36.36)) %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
summarise(geometry = st_combine(geometry), .by = id) %>%
st_cast("LINESTRING") %>%
st_buffer(800)
Now convert those -Inf values to NA (skip if they are already NA in your actual data):
r <- ifel(r == -Inf, NA, r)
and finally, calculate weighted mean values for each polygon in polys:
v <- extract(r, polys, weights = TRUE, fun = mean, na.rm = TRUE)
v
# ID rast_val
# 1 1 NaN
# 2 2 1689.657
# 3 3 1169.131
# 4 4 1627.674