rlatitude-longitudespatialgeographic-distance

Matching data frames based on shortest geographic distance


I have two dataframes, both of which contain latitude and longitude coordinates. The first dataframe is observations of events, where the location and time was recorded. The second dataframe is geographic features, where the location and info about the feature is recorded.

my_df_1 <- structure(list(START_LAT = c(-33.15, -35.6, -34.08333, -34.13333, 
-34.31667, -47.38333, -47.53333, -34.08333, -47.38333, -47.15
), START_LONG = c(163, 165.18333, 162.88333, 162.58333, 162.76667, 
148.98333, 148.66667, 162.9, 148.98333, 148.71667)), row.names = c(1175L, 
528L, 1328L, 870L, 672L, 707L, 506L, 981L, 756L, 210L), class = "data.frame", .Names = c("START_LAT", 
"START_LONG"))

my_df_2 <- structure(list(latitude = c(-42.7984, -34.195, -49.81, -35.417, 
-28.1487, -44.657, -42.7898, -36.245, -39.1335, -31.8482), longitude = c(179.9874, 
179.526, -176.68, 178.765, -168.0314, 174.695, -179.9873, 177.7873, 
-170.0583, 173.2424), depth_top = c(935L, 2204L, 869L, 1973L, 
4750L, 555L, 894L, 1500L, 4299L, 1303L)), row.names = c(580L, 
1306L, 926L, 1102L, 60L, 1481L, 574L, 454L, 1168L, 144L), class = "data.frame", .Names = c("latitude", 
"longitude", "depth_top"))

What I need to do, is for every observation in df1, I need to find out which feature in df2 is geographically closest. Ideally, I'd get a new column appended to df1 which every row is the closest feature from df2.

I worked through this question How to assign several names to lat-lon observations, but was unable to figure out how to match it to my data

The real dataframes have 1000s of rows, which is why I cant do this by hand


Solution

  • A solution using st_distance from the sf package. my_df_final is the final output.

    # Load packages
    library(tidyverse)
    library(sp)
    library(sf)
    
    # Create ID for my_df_1 and my_df_2 based on row id
    # This step is not required, just help me to better distinguish each point
    my_df_1 <- my_df_1 %>% mutate(ID1 = row.names(.))
    my_df_2 <- my_df_2 %>% mutate(ID2 = row.names(.))
    
    # Create spatial point data frame
    my_df_1_sp <- my_df_1
    coordinates(my_df_1_sp) <- ~START_LONG + START_LAT
    
    my_df_2_sp <- my_df_2
    coordinates(my_df_2_sp) <- ~longitude + latitude
    
    # Convert to simple feature
    my_df_1_sf <- st_as_sf(my_df_1_sp)
    my_df_2_sf <- st_as_sf(my_df_2_sp)
    
    # Set projection based on the epsg code
    st_crs(my_df_1_sf) <- 4326
    st_crs(my_df_2_sf) <- 4326
    
    # Calculate the distance
    m_dist <- st_distance(my_df_1_sf, my_df_2_sf)
    
    # Filter for the nearest
    near_index <- apply(m_dist, 1, order)[1, ]
    
    # Based on the index in near_index to select the rows in my_df_2
    # Combine with my_df_1
    my_df_final <- cbind(my_df_1, my_df_2[near_index, ])