rgeo

Trouble using st_union (or st_combine?) to combine polygons with same ID, in sf


I have a shapefile that I'm working with in R, using sf, and it has multiple (tangential) polygons with the same ID. I want to COMBINE those shared-ID polygons into 1 polygon, so that each polygon within the shapefile has a unique ID.

I found somebody else posting about this who seemed to find that st_union did the trick, but I'm super confused as even in his own data example, it is not working. I have posted the code below (using that original posting's data) to reproduce the problem. Could somebody tell me how to alter this code to actually achieve 1 polygon per ID? I have tried st_union, st_combine, and within st_union I have tried by_feature=FALSE and by_feature=TRUE. And I did read the help for st_union/st_combine multiple times, but tbh I just can't really follow what's being said there.

#######################################################
# code adapted from: https://stackoverflow.com/questions/49354393/r-how-do-i-merge-polygon-features-in-a-shapefile-with-many-polygons-reproducib 
#######################################################
library(sf)
library(curl)
library(dplyr)

curl_download("http://biogeo.ucdavis.edu/data/gadm2.8/shp/ETH_adm_shp.zip",
              destfile=paste0("gadmETH.zip"),
              quiet=FALSE)
unzip(paste0("gadmETH.zip"), exdir="gadmETH", overwrite=FALSE) 

sptemp <- sf::st_read(dsn = "/Users/bevis.16/Library/CloudStorage/OneDrive-TheOhioStateUniversity/BoxLeah/Projects/GR Inequality/Data/Geospatial/gadmETH", layer = "ETH_adm2")

# 2 (tangential) polygons have ID_2 == 1. this is the problem. 
# i'd like to combine those into 1 polygon, so these 2 counts below are the same.
dim(sptemp)[1]
length(unique(sptemp$ID_2))

# code that 
copy <- sptemp
copy %>% 
  group_by(ID_2) %>%
  summarise(geometry = sf::st_union(geometry, by_feature=TRUE)) %>%
  ungroup()

# polygons are NOT combined
dim(copy)[1]   # i want this to be 79 features but it is still 80 
length(unique(copy$ID_2)) # despite the fact that there are only 79 unique IDs.

Solution

  • Your code works as intended except for what appears to be a typo where you do not save the results of group_by() |> summarise() to the copy object, so when you check dim(copy)[1], since copy is still equal to sptemp, you get 80 rows. Just save your results after summarization to the copy object as below and you should get your desired output (see below, I took the liberty of renaming a few things):

    library(dplyr)
    library(sf)
    library(curl)
    
    curl_download(
      "http://biogeo.ucdavis.edu/data/gadm2.8/shp/ETH_adm_shp.zip",
      destfile = paste0("gadmETH.zip"),
      quiet = FALSE
    )
    
    unzip(
      paste0("gadmETH.zip"),
      exdir = "gadmETH",
      overwrite = FALSE
    ) 
    
    example_sf <- st_read("gadmETH/ETH_adm2.shp")
    
    copy_sf <- example_sf |>
      group_by(ID_2) |>
      summarise() |>
      ungroup()
    
    dim(copy_sf)[1]
    [1] 79
    length(unique(copy_sf$ID_2))
    [1] 79