I need to calculate the geographic centroid of different polygons grouped by category. Let me give an example. Let's say we have this sf dataset
library("sf")
nc <- st_read(system.file("/shape/nc.shp", package="sf"))
Let's say we group the different polygons into categories.
df <- data.frame(id=rep(1:10,rep=10))
nc <- cbind(nc,df)
If we plot it, we can see that different polygons belong to different categories. Most importantly, these polygons are not always contiguous.
plot(nc["id"])
I would like to calculate the geographic centroid for each category. One of the options I consider was to dissolve the different polygons first and then compute the centroid. However, since polygons are non-contiguous, this does not work.
p <- nc %>% group_by(id) %>%
summarise(geometry = st_union(geometry)) %>%
ungroup()
Also, if I try the code below, it gives me the centroid of each polygon and not by group
p <- nc %>% group_by(id) %>%
summarise(geometry = st_centroid(geometry),.groups = "keep")
Any idea on how to obtain the centroid by group?
It looks like it works if you group_by, st_union in the summarise, and then find the centroid.
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(tidyverse)
nc <- st_read(system.file("/shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source
#> `/home/x/R/x86_64-pc-linux-gnu-library/4.0/sf/shape/nc.shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
df <- data.frame(id=rep(1:10,rep=10))
nc <- cbind(nc,df)
# new sf data.frame, get centroids by group
nc_grouped <- nc %>%
group_by(id) %>%
summarise(st_union(geometry)) %>%
st_centroid()
#> Warning in st_centroid.sf(.): st_centroid assumes attributes are constant over
#> geometries of x
# plot everything
ggplot() +
geom_sf(data = nc, aes(fill = id, alpha = .2)) +
geom_sf(data = nc_grouped, aes(color = id), size = 3) +
scale_fill_viridis_c() +
scale_color_viridis_c()
Hard to see, but they're all there.
# plotting only id ==1, looks about right.
ggplot() +
geom_sf(data = nc[nc$id == 1,]) +
geom_sf(data = nc_grouped[nc_grouped$id == 1,])
Checking only the 1st group, looks about right.
nc %>% group_by(id) %>%
summarise(geometry = st_centroid(geometry),.groups = "keep")
#> Simple feature collection with 100 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -84.05986 ymin: 34.07671 xmax: -75.8095 ymax: 36.49111
#> Geodetic CRS: NAD27
#> # A tibble: 100 × 2
#> # Groups: id [10]
#> id geometry
#> <int> <POINT [°]>
#> 1 1 (-81.49823 36.4314)
#> 2 1 (-79.33478 36.39346)
#> 3 1 (-76.61642 36.14886)
#> 4 1 (-77.98691 35.96228)
#> 5 1 (-81.17403 35.91575)
#> 6 1 (-77.37784 35.58906)
#> 7 1 (-81.91783 35.39871)
#> 8 1 (-80.24928 35.31388)
#> 9 1 (-84.05986 35.13111)
#> 10 1 (-77.10388 35.13065)
#> # … with 90 more rows
Created on 2022-05-17 by the reprex package (v2.0.1)