rgisspatialr-sfrmaps

How to calculate geographic centroids by group?


I need to calculate the geographic centroid of different polygons grouped by category. Let me give an example. Let's say we have this sf dataset

library("sf")
nc <- st_read(system.file("/shape/nc.shp", package="sf"))

Let's say we group the different polygons into categories.

df <- data.frame(id=rep(1:10,rep=10))
nc <- cbind(nc,df)

If we plot it, we can see that different polygons belong to different categories. Most importantly, these polygons are not always contiguous.

plot(nc["id"])

I would like to calculate the geographic centroid for each category. One of the options I consider was to dissolve the different polygons first and then compute the centroid. However, since polygons are non-contiguous, this does not work.

p <- nc %>% group_by(id) %>%
  summarise(geometry = st_union(geometry)) %>%
  ungroup()

Also, if I try the code below, it gives me the centroid of each polygon and not by group

p <- nc %>% group_by(id) %>%
    summarise(geometry = st_centroid(geometry),.groups = "keep")

Any idea on how to obtain the centroid by group?


Solution

  • It looks like it works if you group_by, st_union in the summarise, and then find the centroid.

    library(sf)
    #> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
    library(tidyverse)
    
    nc <- st_read(system.file("/shape/nc.shp", package="sf"))
    #> Reading layer `nc' from data source 
    #>   `/home/x/R/x86_64-pc-linux-gnu-library/4.0/sf/shape/nc.shp' 
    #>   using driver `ESRI Shapefile'
    #> Simple feature collection with 100 features and 14 fields
    #> Geometry type: MULTIPOLYGON
    #> Dimension:     XY
    #> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
    #> Geodetic CRS:  NAD27
    
    df <- data.frame(id=rep(1:10,rep=10))
    nc <- cbind(nc,df)
    
    
    # new sf data.frame, get centroids by group
    nc_grouped <- nc %>%
                    group_by(id) %>%
                    summarise(st_union(geometry)) %>%
                    st_centroid() 
    #> Warning in st_centroid.sf(.): st_centroid assumes attributes are constant over
    #> geometries of x
    
    # plot everything
    ggplot() + 
      geom_sf(data = nc, aes(fill = id, alpha = .2)) +
      geom_sf(data = nc_grouped, aes(color = id), size = 3) + 
      scale_fill_viridis_c() +
      scale_color_viridis_c()
    

    Hard to see, but they're all there.

    # plotting only id ==1, looks about right.
    ggplot() + 
      geom_sf(data = nc[nc$id == 1,]) + 
      geom_sf(data = nc_grouped[nc_grouped$id == 1,])
    

    Checking only the 1st group, looks about right.

    
    
    nc %>% group_by(id) %>%
      summarise(geometry = st_centroid(geometry),.groups = "keep")
    #> Simple feature collection with 100 features and 1 field
    #> Geometry type: POINT
    #> Dimension:     XY
    #> Bounding box:  xmin: -84.05986 ymin: 34.07671 xmax: -75.8095 ymax: 36.49111
    #> Geodetic CRS:  NAD27
    #> # A tibble: 100 × 2
    #> # Groups:   id [10]
    #>       id             geometry
    #>    <int>          <POINT [°]>
    #>  1     1  (-81.49823 36.4314)
    #>  2     1 (-79.33478 36.39346)
    #>  3     1 (-76.61642 36.14886)
    #>  4     1 (-77.98691 35.96228)
    #>  5     1 (-81.17403 35.91575)
    #>  6     1 (-77.37784 35.58906)
    #>  7     1 (-81.91783 35.39871)
    #>  8     1 (-80.24928 35.31388)
    #>  9     1 (-84.05986 35.13111)
    #> 10     1 (-77.10388 35.13065)
    #> # … with 90 more rows
    

    Created on 2022-05-17 by the reprex package (v2.0.1)