rgeospatialr-sf

How to fix invalid geometries in sf objects that st_make_valid() cannot repair?


I would like to find a way to fix broken geometries in objects returned by rnaturalearth.

This reprex demonstrates invalid geometries in Natural Earth datasets and explores potential solutions for issue #78: https://github.com/ropensci/rnaturalearth/issues/78

library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
library(rnaturalearth)
library(geos)
library(lwgeom)
#> Linking to liblwgeom 3.0.0beta1 r16016, GEOS 3.12.1, PROJ 9.4.0
#> 
#> Attaching package: 'lwgeom'
#> The following objects are masked from 'package:sf':
#> 
#>     st_minimum_bounding_circle, st_perimeter

# Load world data and identify invalid geometries
world_small <- ne_countries(scale = "small", returnclass = "sf")
world_small_non_valid <- world_small[!st_is_valid(world_small), ]

# Display countries with invalid geometries
world_small_non_valid$sovereignt
#> [1] "Sudan"  "Russia"

# Show validation reasons
st_is_valid(world_small_non_valid, reason = TRUE)
#> [1] "Edge 0 crosses edge 78"              
#> [2] "Loop 0 edge 1 crosses loop 12 edge 5"

# Extract specific problematic cases
sudan <- world_small_non_valid[1L, ]
russia <- world_small_non_valid[2L, ]

Test st_make_valid() effectiveness. Russia geometry can be fixed, Sudan cannot

st_is_valid(st_make_valid(russia))
#> [1] TRUE
st_is_valid(st_make_valid(sudan))
#> [1] FALSE

Investigate Sudan’s problematic crossing edges

sudan_coords <- st_coordinates(sudan)
edge_0 <- sudan_coords[1L:2L, c("X", "Y")]
edge_78 <- sudan_coords[79L:80L, c("X", "Y")]

plot(st_geometry(sudan), col = "lightblue", border = "gray", lwd = 1L)

# Highlight crossing edges
lines(edge_0[, "X"], edge_0[, "Y"], col = "red", lwd = 3L)
text(
  mean(edge_0[, "X"]),
  mean(edge_0[, "Y"]),
  "Edge 0",
  col = "red",
  pos = 3L
)

lines(edge_78[, "X"], edge_78[, "Y"], col = "orange", lwd = 3L)
text(
  mean(edge_78[, "X"]),
  mean(edge_78[, "Y"]),
  "Edge 78",
  col = "orange",
  pos = 3L
)

Image of Sudan geography with broken geometry


# Create zoomed plot of problematic area
all_edge_coords <- rbind(edge_0, edge_78)
x_range <- range(all_edge_coords[, "X"])
y_range <- range(all_edge_coords[, "Y"])

# Add buffer around the crossing area
buffer <- 0.1
x_buffer <- diff(x_range) * buffer
y_buffer <- diff(y_range) * buffer

plot(
  st_geometry(sudan),
  col = "lightblue",
  border = "gray",
  lwd = 1L,
  xlim = c(x_range[1] - x_buffer, x_range[2] + x_buffer),
  ylim = c(y_range[1] - y_buffer, y_range[2] + y_buffer),
  main = "Zoomed view of crossing edges"
)

# Highlight crossing edges in zoomed view
lines(edge_0[, "X"], edge_0[, "Y"], col = "red", lwd = 4L)
text(
  mean(edge_0[, "X"]),
  mean(edge_0[, "Y"]),
  "Edge 0",
  col = "red",
  pos = 3L,
  cex = 1.2
)

lines(edge_78[, "X"], edge_78[, "Y"], col = "orange", lwd = 4L)
text(
  mean(edge_78[, "X"]),
  mean(edge_78[, "Y"]),
  "Edge 78",
  col = "orange",
  pos = 1L,
  cex = 1.2
)

Image of Sudan geography with zoom on broken geometry


# Confirm Sudan remains invalid after st_make_valid()
st_is_valid(st_make_valid(sudan), reason = TRUE)
#> [1] "Edge 0 crosses edge 78"

# Test alternative: geos_make_valid() - also fails to fix Sudan
geos::geos_make_valid(sudan) |>
  st_as_sf() |>
  st_is_valid(reason = TRUE)
#> [1] "Edge 0 crosses edge 78"

# Test alternative: lwgeom_make_valid - also fails to fix Sudan
st_is_valid(lwgeom_make_valid(st_as_sfc(sudan)), reason = TRUE)
#> [1] "Edge 0 crosses edge 78"

If I turn s2 off, it says the geometry is valid.

sf_use_s2(FALSE)
#> Spherical geometry (s2) switched off
st_is_valid(sudan, reason = TRUE)
#> [1] "Valid Geometry"

sf_use_s2(TRUE)
#> Spherical geometry (s2) switched on
st_is_valid(sudan, reason = TRUE)
#> [1] "Edge 0 crosses edge 78"

How could this be addressed? If possible, I would like to introduce a new argument in the ne_download() function to let users choose whether they want to try fixing broken geometries.

Created on 2025-09-23 with reprex v2.1.1


Solution

  • Making a single polygon with self-intersection topologically valid can work, but is topology of the whole dataset still valid and / or suitable for you and your users, is a different question. And I'm afraid it just... "depends".

    That exact Natural Earth Sudan example was discussed in the s2 repo - https://github.com/r-spatial/s2/issues/99 . Basically transform sf shape to s2 with checks disabled, run it through s2::s2_union() and expect it will pass s2 validity tests.

    So we can try applying s2_union() on all shapes that failed with self-intersection issues:

    library(rnaturalearth)
    library(mapview)
    library(s2)
    library(sf)
    #> Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
    
    countries <- c("Central African Republic", "Sudan", "South Sudan")
    
    nbrs_sf <-  # original for comparison
    nbrs_sf2 <- 
      ne_countries(scale = "small", country = c(countries), returnclass = "sf") |> 
      subset(select = c(name, geometry))
    
    # (s2) validity check:
    transform(nbrs_sf, valid = st_is_valid(geometry, reason = TRUE))
    #> Simple feature collection with 3 features and 2 fields
    #> Geometry type: MULTIPOLYGON
    #> Dimension:     XY
    #> Bounding box:  xmin: 14.45941 ymin: 2.26764 xmax: 38.41009 ymax: 22
    #> Geodetic CRS:  WGS 84
    #>                     name                  valid                       geometry
    #> 15                 Sudan Edge 0 crosses edge 78 MULTIPOLYGON (((24.56737 8....
    #> 67  Central African Rep.         Valid Geometry MULTIPOLYGON (((27.37423 5....
    #> 177             S. Sudan         Valid Geometry MULTIPOLYGON (((30.83385 3....
    
    # find edge-crossing issues
    cross_edge <- 
      st_is_valid(nbrs_sf, reason = TRUE) |>
      grepl(x = _, pattern = "edge .* crosses edge", ignore.case = TRUE)
    cross_edge
    #> [1]  TRUE FALSE FALSE
    
    # apply s2_union() on problematic shapes
    st_geometry(nbrs_sf2)[cross_edge] <-
      st_geometry(nbrs_sf2)[cross_edge] |>  
      as_s2_geography(check = FALSE) |>
      s2_union() |>
      st_as_sfc() 
    
    

    Re-check s2 validity:

    transform(nbrs_sf2, valid = st_is_valid(geometry, reason = TRUE))
    #> Simple feature collection with 3 features and 2 fields
    #> Geometry type: MULTIPOLYGON
    #> Dimension:     XY
    #> Bounding box:  xmin: 14.45941 ymin: 2.26764 xmax: 38.41009 ymax: 22
    #> Geodetic CRS:  WGS 84
    #>                     name          valid                       geometry
    #> 15                 Sudan Valid Geometry MULTIPOLYGON (((24.19407 8....
    #> 67  Central African Rep. Valid Geometry MULTIPOLYGON (((27.37423 5....
    #> 177             S. Sudan Valid Geometry MULTIPOLYGON (((30.83385 3....
    
    # visual check
    mapview(nbrs_sf2)
    

    mapview screenshot, zoomed out

    Looks good? Well ... depends. It's probably good enough for many applications & visualizations.

    But as soon as we need to transform to projected CRS and happen to zoom in, we are faced with gap and overlap issues.

    Here's that western artefact, Edge 0, zoomed in, before (1st column) and after with vertices; native.crs to disable reprojection to EPSG:3857, geographic coordinates are plotted on a Cartesian plane as-is. 2nd row shows Sudan without neighbours, artefact was formed by 2 overlapping edges of different lengths, with s2 conversion it formed a self-intersection and with st_union() it become a triangle with area -- a 2nd Polygon of Sudan Multipolygon.

    leafsync::sync(
      mapview(list(nbrs_sf, st_cast(nbrs_sf, "POINT")), layer.name = "rne", legend = FALSE, native.crs = TRUE),
      mapview(list(nbrs_sf2, st_cast(nbrs_sf2, "POINT")), layer.name = "s2_union", legend = FALSE, native.crs = TRUE),
      mapview(list(nbrs_sf[1,], st_cast(nbrs_sf[1,], "POINT")), layer.name = "Sudan_rne", legend = FALSE, native.crs = TRUE),
      mapview(list(nbrs_sf2[1,], st_cast(nbrs_sf2[1,], "POINT")), layer.name = "Sudan_s2_union", legend = FALSE, native.crs = TRUE)
    )
    

    mapview, before vs after, zoomed in

    I believe this should rather be fixed upstream, currently there are no vertices at trijunctions (Sudan - S. Sudan - Central African Rep. and Sudan - S. Sudan - Ethiopia) that are properly shared by all touching polygons, this was probably introduced when Sudan shape was split. Without those anchor points at junctions, most transformations (incl. projection and conversion to s2) will mess up topology.


    Not quite sure why sf:::st_make_valid.sfc() is still (only) using s2::s2_rebuild() for s2 geometries, perhaps there were too many cases where it introduced more issues or it was just considered better to leave this kind of manipulation to users.