I would like to find a way to fix broken geometries in objects returned by
rnaturalearth.
This reprex demonstrates invalid geometries in Natural Earth datasets and explores potential solutions for issue #78: https://github.com/ropensci/rnaturalearth/issues/78
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
library(rnaturalearth)
library(geos)
library(lwgeom)
#> Linking to liblwgeom 3.0.0beta1 r16016, GEOS 3.12.1, PROJ 9.4.0
#>
#> Attaching package: 'lwgeom'
#> The following objects are masked from 'package:sf':
#>
#> st_minimum_bounding_circle, st_perimeter
# Load world data and identify invalid geometries
world_small <- ne_countries(scale = "small", returnclass = "sf")
world_small_non_valid <- world_small[!st_is_valid(world_small), ]
# Display countries with invalid geometries
world_small_non_valid$sovereignt
#> [1] "Sudan" "Russia"
# Show validation reasons
st_is_valid(world_small_non_valid, reason = TRUE)
#> [1] "Edge 0 crosses edge 78"
#> [2] "Loop 0 edge 1 crosses loop 12 edge 5"
# Extract specific problematic cases
sudan <- world_small_non_valid[1L, ]
russia <- world_small_non_valid[2L, ]
Test st_make_valid() effectiveness. Russia geometry can be fixed, Sudan
cannot
st_is_valid(st_make_valid(russia))
#> [1] TRUE
st_is_valid(st_make_valid(sudan))
#> [1] FALSE
Investigate Sudan’s problematic crossing edges
sudan_coords <- st_coordinates(sudan)
edge_0 <- sudan_coords[1L:2L, c("X", "Y")]
edge_78 <- sudan_coords[79L:80L, c("X", "Y")]
plot(st_geometry(sudan), col = "lightblue", border = "gray", lwd = 1L)
# Highlight crossing edges
lines(edge_0[, "X"], edge_0[, "Y"], col = "red", lwd = 3L)
text(
mean(edge_0[, "X"]),
mean(edge_0[, "Y"]),
"Edge 0",
col = "red",
pos = 3L
)
lines(edge_78[, "X"], edge_78[, "Y"], col = "orange", lwd = 3L)
text(
mean(edge_78[, "X"]),
mean(edge_78[, "Y"]),
"Edge 78",
col = "orange",
pos = 3L
)

# Create zoomed plot of problematic area
all_edge_coords <- rbind(edge_0, edge_78)
x_range <- range(all_edge_coords[, "X"])
y_range <- range(all_edge_coords[, "Y"])
# Add buffer around the crossing area
buffer <- 0.1
x_buffer <- diff(x_range) * buffer
y_buffer <- diff(y_range) * buffer
plot(
st_geometry(sudan),
col = "lightblue",
border = "gray",
lwd = 1L,
xlim = c(x_range[1] - x_buffer, x_range[2] + x_buffer),
ylim = c(y_range[1] - y_buffer, y_range[2] + y_buffer),
main = "Zoomed view of crossing edges"
)
# Highlight crossing edges in zoomed view
lines(edge_0[, "X"], edge_0[, "Y"], col = "red", lwd = 4L)
text(
mean(edge_0[, "X"]),
mean(edge_0[, "Y"]),
"Edge 0",
col = "red",
pos = 3L,
cex = 1.2
)
lines(edge_78[, "X"], edge_78[, "Y"], col = "orange", lwd = 4L)
text(
mean(edge_78[, "X"]),
mean(edge_78[, "Y"]),
"Edge 78",
col = "orange",
pos = 1L,
cex = 1.2
)

# Confirm Sudan remains invalid after st_make_valid()
st_is_valid(st_make_valid(sudan), reason = TRUE)
#> [1] "Edge 0 crosses edge 78"
# Test alternative: geos_make_valid() - also fails to fix Sudan
geos::geos_make_valid(sudan) |>
st_as_sf() |>
st_is_valid(reason = TRUE)
#> [1] "Edge 0 crosses edge 78"
# Test alternative: lwgeom_make_valid - also fails to fix Sudan
st_is_valid(lwgeom_make_valid(st_as_sfc(sudan)), reason = TRUE)
#> [1] "Edge 0 crosses edge 78"
If I turn s2 off, it says the geometry is valid.
sf_use_s2(FALSE)
#> Spherical geometry (s2) switched off
st_is_valid(sudan, reason = TRUE)
#> [1] "Valid Geometry"
sf_use_s2(TRUE)
#> Spherical geometry (s2) switched on
st_is_valid(sudan, reason = TRUE)
#> [1] "Edge 0 crosses edge 78"
How could this be addressed? If possible, I would like to introduce a new
argument in the ne_download() function to let users choose whether they
want to try fixing broken geometries.
Created on 2025-09-23 with reprex v2.1.1
Making a single polygon with self-intersection topologically valid can work, but is topology of the whole dataset still valid and / or suitable for you and your users, is a different question. And I'm afraid it just... "depends".
That exact Natural Earth Sudan example was discussed in the s2 repo - https://github.com/r-spatial/s2/issues/99 . Basically transform sf shape to s2 with checks disabled, run it through s2::s2_union() and expect it will pass s2 validity tests.
So we can try applying s2_union() on all shapes that failed with self-intersection issues:
library(rnaturalearth)
library(mapview)
library(s2)
library(sf)
#> Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
countries <- c("Central African Republic", "Sudan", "South Sudan")
nbrs_sf <- # original for comparison
nbrs_sf2 <-
ne_countries(scale = "small", country = c(countries), returnclass = "sf") |>
subset(select = c(name, geometry))
# (s2) validity check:
transform(nbrs_sf, valid = st_is_valid(geometry, reason = TRUE))
#> Simple feature collection with 3 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 14.45941 ymin: 2.26764 xmax: 38.41009 ymax: 22
#> Geodetic CRS: WGS 84
#> name valid geometry
#> 15 Sudan Edge 0 crosses edge 78 MULTIPOLYGON (((24.56737 8....
#> 67 Central African Rep. Valid Geometry MULTIPOLYGON (((27.37423 5....
#> 177 S. Sudan Valid Geometry MULTIPOLYGON (((30.83385 3....
# find edge-crossing issues
cross_edge <-
st_is_valid(nbrs_sf, reason = TRUE) |>
grepl(x = _, pattern = "edge .* crosses edge", ignore.case = TRUE)
cross_edge
#> [1] TRUE FALSE FALSE
# apply s2_union() on problematic shapes
st_geometry(nbrs_sf2)[cross_edge] <-
st_geometry(nbrs_sf2)[cross_edge] |>
as_s2_geography(check = FALSE) |>
s2_union() |>
st_as_sfc()
Re-check s2 validity:
transform(nbrs_sf2, valid = st_is_valid(geometry, reason = TRUE))
#> Simple feature collection with 3 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 14.45941 ymin: 2.26764 xmax: 38.41009 ymax: 22
#> Geodetic CRS: WGS 84
#> name valid geometry
#> 15 Sudan Valid Geometry MULTIPOLYGON (((24.19407 8....
#> 67 Central African Rep. Valid Geometry MULTIPOLYGON (((27.37423 5....
#> 177 S. Sudan Valid Geometry MULTIPOLYGON (((30.83385 3....
# visual check
mapview(nbrs_sf2)
Looks good? Well ... depends. It's probably good enough for many applications & visualizations.
But as soon as we need to transform to projected CRS and happen to zoom in, we are faced with gap and overlap issues.
Here's that western artefact, Edge 0, zoomed in, before (1st column) and after with vertices; native.crs to disable reprojection to EPSG:3857, geographic coordinates are plotted on a Cartesian plane as-is. 2nd row shows Sudan without neighbours, artefact was formed by 2 overlapping edges of different lengths, with s2 conversion it formed a self-intersection and with st_union() it become a triangle with area -- a 2nd Polygon of Sudan Multipolygon.
leafsync::sync(
mapview(list(nbrs_sf, st_cast(nbrs_sf, "POINT")), layer.name = "rne", legend = FALSE, native.crs = TRUE),
mapview(list(nbrs_sf2, st_cast(nbrs_sf2, "POINT")), layer.name = "s2_union", legend = FALSE, native.crs = TRUE),
mapview(list(nbrs_sf[1,], st_cast(nbrs_sf[1,], "POINT")), layer.name = "Sudan_rne", legend = FALSE, native.crs = TRUE),
mapview(list(nbrs_sf2[1,], st_cast(nbrs_sf2[1,], "POINT")), layer.name = "Sudan_s2_union", legend = FALSE, native.crs = TRUE)
)
I believe this should rather be fixed upstream, currently there are no vertices at trijunctions (Sudan - S. Sudan - Central African Rep. and Sudan - S. Sudan - Ethiopia) that are properly shared by all touching polygons, this was probably introduced when Sudan shape was split. Without those anchor points at junctions, most transformations (incl. projection and conversion to s2) will mess up topology.
Not quite sure why sf:::st_make_valid.sfc() is still (only) using s2::s2_rebuild() for s2 geometries, perhaps there were too many cases where it introduced more issues or it was just considered better to leave this kind of manipulation to users.