I have two Shape files with different names for the different regions of the world. I need the names of the polygons in countries
which correspond to the names of the polygons in countries_GIFT
. I have done this with other Shape files and it has been working without problems, but for some reason I am getting mismatches between both data frames.
For example an entry on Russia in the first shapefile is being identified as Estonia in the other shape file. When checking the x y coordinates, I can see that the point is indeed in Russia, but for some reason it is being extracted at something else.
I already checked the projection of both shapefiles and whether the regions overlap each other, so I am not sure what the problem is.
Here is my code
library(sf)
library(tidyverse)
library(GIFT)
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
sf_use_s2(FALSE)
countries <- st_read("../../../../locations/Shapefiles/Locations.shp")
countries <- st_transform(countries, crs = 4326)
GIFT_shapes <- GIFT_shapes() # retrieves all shapefiles by default
countries_GIFT <- GIFT_shapes()
countries_GIFT_crs <- st_transform(countries_GIFT, crs = 4326)
gift_stand_countries <- st_join(countries_GIFT_crs, countries, join = st_intersects)
# Check if regions overlap
ggplot() +
geom_sf(data = countries_GIFT_crs, fill = "lightblue", color = "black") +
geom_sf(data = countries, fill = NA, color = "red")
And here is a plot of both shape files:
And the geometry of both df
head(countries$geometry)
Geometry set for 6 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -178.8733 ymin: -14.54889 xmax: 74.89291 ymax: 71.33334
Geodetic CRS: WGS 84
First 5 geometries:
MULTIPOLYGON (((71.00423 38.47541, 71.02091 38....
MULTIPOLYGON (((20.97472 59.96736, 20.98694 59....
MULTIPOLYGON (((-178.8139 51.8375, -178.7424 51...
MULTIPOLYGON (((19.59271 41.63155, 19.59014 41....
MULTIPOLYGON (((-1.630694 35.21958, -1.630972 3...
head(countries_GIFT_crs$geometry)
Geometry set for 6 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.6
Geodetic CRS: WGS 84
First 5 geometries:
MULTIPOLYGON (((-81.81 24.54, -81.77 24.54, -81...
MULTIPOLYGON (((145.99 43.37, 146.01 43.36, 146...
MULTIPOLYGON (((140.87 51.42, 140.88 51.43, 140...
MULTIPOLYGON (((166.4939 -78.21118, 166.6582 -7...
MULTIPOLYGON (((145.99 43.37, 146.01 43.36, 146...
Many thanks in advance
It's difficult to point a finger at something concrete without having access to both sources, but you are most likely dealing with datasets with different resolutions and precisions, latter is also visible from your object's geometries, 71.00423 38.47541
vs -81.81 24.54
. Even with same/similar resolution and precision it would be highly unlikely to get a perfect match for shapes from different data sources, so intersection with neighbours is expected (+1 @VinceGreg's comment).
Just to illustrate, we can take GIFT_shapes()
for Estonia and check which mid-res shapes from GISCO are intersecting, along with a higher-res country outline for reference:
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(GIFT)
library(ggplot2)
# 2016; resolution: 1:20million
countries <- giscoR::gisco_countries[, "CNTR_ID"]
# Estonia
GIFT_est <- GIFT_shapes(entity_ID = 11456)
#> You are asking for the latest stable version of GIFT which is 3.2.
#> ================================================================================
intersecting_countries <-
st_join(
countries,
GIFT_est,
left = FALSE
) |>
# crop with extended bbox for plot
st_crop(st_bbox(GIFT_est) + c(-.2, -.2, .2, .2))
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
intersecting_countries[, 1:2]
#> Simple feature collection with 3 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 21.56417 ymin: 57.31555 xmax: 28.40378 ymax: 60.05231
#> Geodetic CRS: WGS 84
#> CNTR_ID entity_ID geometry
#> 82 EE 11456 MULTIPOLYGON (((27.83301 59...
#> 138 LV 11456 MULTIPOLYGON (((27.75134 57...
#> 185 RU 11456 MULTIPOLYGON (((28.40378 59...
ggplot() +
geom_sf(data = intersecting_countries, aes(fill = CNTR_ID)) +
geom_sf(data = GIFT_est, fill = "white", alpha = .6 ) +
# 1:1million outline for reference
geom_sf(data = giscoR::gisco_get_countries(resolution = "01", country = "Estonia"), color = "navy", fill = NA) +
scale_fill_viridis_d() +
coord_sf(expand = FALSE)
Or you can just check https://gift.uni-goettingen.de/map .
You might get desired result by first generating st_point_on_surface()
points from polygons of one dataset (centroid is not guaranteed to end up in the polygon) and joining those to polygons from another:
GIFT_est |>
st_point_on_surface() |>
st_join(giscoR::gisco_countries)
#> Warning: st_point_on_surface assumes attributes are constant over geometries
#> Warning in st_point_on_surface.sfc(st_geometry(x)): st_point_on_surface may not
#> give correct results for longitude/latitude data
#> Simple feature collection with 1 feature and 17 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 25.50592 ymin: 58.62939 xmax: 25.50592 ymax: 58.62939
#> Geodetic CRS: WGS 84
#> entity_ID geo_entity point_x point_y area x_min x_max y_min
#> 1 11456 Estonia 25.54518 58.67211 45483.44 21.76361 28.21478 57.51511
#> y_max entity_class entity_type polygon_source CNTR_ID CNTR_NAME
#> 1 59.82264 Mainland Political Unit GADM EE Eesti
#> ISO3_CODE NAME_ENGL FID geometry
#> 1 EST Estonia EE POINT (25.50592 58.62939)
With projected CRS I would expect it to work quite well, with geographic coordinates that warning above should probably not be taken too lightly.