rgeospatialr-sfclosest-points

How to combine spatial join with grouping data?


I would like to join two spatial dataframes based on the smallest distance between locations. However, I would like this selection to consider Year as well, i.e. each location in dat should be joined with the closest point in the respective year in dat2.

Is there a way to add something like by = "Year" to the spatial join?

library(sf)

dat <- data.frame(Year = c(1980:1989),
                  Lat = c(50.20920, 50.20920, 48.91840, 49.28491, 52.28401, 50.20920, 50.20920, 48.91840, 49.28491, 52.28401),
                  Lon = c(9.49204, 8.75930, 9.84920, 7.85730, 7.49284, 9.49204, 8.75930, 9.84920, 7.85730, 7.49284))

dat2 <- data.frame(Var = c(rep("A", 3), rep("B", 5), rep("C", 6)),
                   Year = c(1980:1982, 1981:1985, 1984:1989),
                   Lat = c(49.829402, 48.02849, 50.94892, 41.38593, 50.28402, 48.93840, 50.73850, 52.48492, 41.38593, 50.28402, 48.93840, 50.20920, 50.20920, 48.02849),
                   Lon = c(9.84920, 7.85730, 7.49284, 6.29482, 9.87294, 8.67204, 8.94837, 7.49284, 6.29482, 9.87294, 7.92840, 8.02948, 9.89039, 8.63029))

dat.sf <- st_as_sf(dat, coords = c("Lon", "Lat")) |>
  st_set_crs(4326) 

dat2.sf <- st_as_sf(dat2, coords = c("Lon", "Lat")) |>
  st_set_crs(4326) 

st_join(dat.sf, dat2.sf, join = st_nearest_feature)

Simple feature collection with 10 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 7.49284 ymin: 48.9184 xmax: 9.8492 ymax: 52.28401
Geodetic CRS:  WGS 84
   Year.x   Var Year.y                 geometry
1    1980     B   1982  POINT (9.49204 50.2092)
2    1981     C   1987   POINT (8.7593 50.2092)
3    1982     B   1983   POINT (9.8492 48.9184)
4    1983     C   1986  POINT (7.8573 49.28491)
5    1984     B   1985 POINT (7.49284 52.28401)
6    1985     B   1982  POINT (9.49204 50.2092)
7    1986     C   1987   POINT (8.7593 50.2092)
8    1987     B   1983   POINT (9.8492 48.9184)
9    1988     C   1986  POINT (7.8573 49.28491)
10   1989     B   1985 POINT (7.49284 52.28401)

Solution

  • You could perform the st_join on the subset of the datasets for each year, and then bind the resulting data together;

    do.call("rbind", 
            lapply(unique(dat.sf$Year), \(y) {
              st_join(subset(dat.sf, Year == y),
                      subset(dat2.sf, Year == y),
                      join = st_nearest_feature)
            }))
    
    #> Simple feature collection with 10 features and 3 fields
    #> Geometry type: POINT
    #> Dimension:     XY
    #> Bounding box:  xmin: 7.49284 ymin: 48.9184 xmax: 9.8492 ymax: 52.28401
    #> Geodetic CRS:  WGS 84
    #>    Year.x Group Year.y                 geometry
    #> 1    1980     A   1980  POINT (9.49204 50.2092)
    #> 2    1981     A   1981   POINT (8.7593 50.2092)
    #> 3    1982     B   1982   POINT (9.8492 48.9184)
    #> 4    1983     B   1983  POINT (7.8573 49.28491)
    #> 5    1984     B   1984 POINT (7.49284 52.28401)
    #> 6    1985     C   1985  POINT (9.49204 50.2092)
    #> 7    1986     C   1986   POINT (8.7593 50.2092)
    #> 8    1987     C   1987   POINT (9.8492 48.9184)
    #> 9    1988     C   1988  POINT (7.8573 49.28491)
    #> 10   1989     C   1989 POINT (7.49284 52.28401)