I have a data frame with about 20,000 rows containing the latitude and longitude of points spread across the world. I need to determine the population density at those points, so I considered using the GHSL population data. However, I am unsure how to extract that information from the GHSL population data, or if it is even possible. (
As you haven't provided any example data, information about which GHSL year to use, or your preferred language, here is an R reprex using example data.
I have used the 30arcsec resolution GHSL (~100m cell size) as the 3arcsec option was too large for my computer to handle. If you need a finer resolution, you can download the GHSL tile by tile, but that would be laborious. Given you point data are at global scale, I suspect ~100m will suffice. This example assumes your coordinates are WGS84/EPSG:4326.
First, load required packages and create example point df:
library(rnaturalearth)
library(terra)
library(sf)
library(dplyr)
library(ggplot2)
# Load world polygons and project to WGS84 Pseudo Mercator/EPSG:3857
world <- ne_countries() %>%
filter(!admin == "Antarctica") %>%
st_transform(3857)
# Create example points df using world and convert coordinates to WGS84:EPSG4326
set.seed(1)
df <- st_sample(world, size = 20000, type = "random") %>%
st_as_sf() %>%
st_transform(4326) %>%
mutate(ID = 1:n(),
lon = st_coordinates(.)[,1],
lat = st_coordinates(.)[,2]) %>%
data.frame() %>%
select(-x)
head(df)
# ID lon lat
# 1 1 26.2272108 -1.853224
# 2 2 146.9548044 65.990806
# 3 3 -105.8491530 73.182944
# 4 4 -116.4395691 70.634154
# 5 5 97.1429112 34.367544
# 6 6 -0.8282728 46.360984
Note the addition of the "ID" column. You will need to add this to your actual df as the output from terra::extract()
creates an "ID" column, and adding the "ID" column provides a way to join your population density data back to your original data. Using dplyr::left_join()
is not strictly necessary, you could just use the pop_den data, but I have assumed you have other columns in your df, so using a join makes things easier to keep track of. The workflow below joins the data to sf_points, but you could also just join pop_den back to your df using the same method.
Next, load GHSL as a SpatRaster:
# Load WGS84/EPSG:4326 GHSL, 30 arcsec, 202 from working directory,
# previously unzipped and downloaded from
# https://human-settlement.emergency.copernicus.eu/download.php
ghsl <- rast("data/GHS_BUILT_S_E2020_GLOBE_R2023A_4326_30ss_V1_0.tif")
ghsl <- ghsl * 1
names(ghsl) <- "pop_density_2020"
ghsl
# class : SpatRaster
# dimensions : 21384, 43201, 1 (nrow, ncol, nlyr)
# resolution : 0.008333333, 0.008333333 (x, y)
# extent : -180.0012, 180.0071, -89.10042, 89.09958 (xmin, xmax, ymin, ymax)
# coord. ref. : lon/lat WGS 84 (EPSG:4326)
# source : spat_391c2813558f_14620.tif
# varname : GHS_BUILT_S_E2020_GLOBE_R2023A_4326_30ss_V1_0
# name : pop_density_2020
# min value : 0
# max value : 848108
Now create an sf object from your df, extract cell values from ghsl, and join the result to sf_points:
# Create sf from df
sf_points <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
# Return population density per point
pop_den <- extract(ghsl, sf_points)
# Join population density to sf_points
sf_points <- left_join(sf_points, pop_den, by = "ID")
# Examine result
filter(sf_points, pop_density_2020 > 0)
# Simple feature collection with 2542 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: -155.8613 ymin: -55.07594 xmax: 177.6207 ymax: 72.03816
# Geodetic CRS: WGS 84
# First 10 features:
# ID pop_density_2020 geometry
# 1 22 1365 POINT (-91.87298 35.95215)
# 2 45 21 POINT (21.92427 41.46684)
# 3 53 19 POINT (-80.00786 37.48202)
# 4 62 165 POINT (76.22779 19.11223)
# 5 65 88 POINT (101.8265 24.57473)
# 6 68 1712 POINT (-114.7794 42.88764)
# 7 72 366 POINT (-87.25635 33.31995)
# 8 73 76958 POINT (127.4405 36.84436)
# 9 81 720 POINT (-79.93223 -1.357552)
# 10 107 30 POINT (15.16695 -13.33818)