I have two dataframes, both of which contain latitude and longitude coordinates. The first dataframe is observations of events, where the location and time was recorded. The second dataframe is geographic features, where the location and info about the feature is recorded.
my_df_1 <- structure(list(START_LAT = c(-33.15, -35.6, -34.08333, -34.13333,
-34.31667, -47.38333, -47.53333, -34.08333, -47.38333, -47.15
), START_LONG = c(163, 165.18333, 162.88333, 162.58333, 162.76667,
148.98333, 148.66667, 162.9, 148.98333, 148.71667)), row.names = c(1175L,
528L, 1328L, 870L, 672L, 707L, 506L, 981L, 756L, 210L), class = "data.frame", .Names = c("START_LAT",
"START_LONG"))
my_df_2 <- structure(list(latitude = c(-42.7984, -34.195, -49.81, -35.417,
-28.1487, -44.657, -42.7898, -36.245, -39.1335, -31.8482), longitude = c(179.9874,
179.526, -176.68, 178.765, -168.0314, 174.695, -179.9873, 177.7873,
-170.0583, 173.2424), depth_top = c(935L, 2204L, 869L, 1973L,
4750L, 555L, 894L, 1500L, 4299L, 1303L)), row.names = c(580L,
1306L, 926L, 1102L, 60L, 1481L, 574L, 454L, 1168L, 144L), class = "data.frame", .Names = c("latitude",
"longitude", "depth_top"))
What I need to do, is for every observation in df1, I need to find out which feature in df2 is geographically closest. Ideally, I'd get a new column appended to df1 which every row is the closest feature from df2.
I worked through this question How to assign several names to lat-lon observations, but was unable to figure out how to match it to my data
The real dataframes have 1000s of rows, which is why I cant do this by hand
A solution using st_distance
from the sf
package. my_df_final
is the final output.
# Load packages
library(tidyverse)
library(sp)
library(sf)
# Create ID for my_df_1 and my_df_2 based on row id
# This step is not required, just help me to better distinguish each point
my_df_1 <- my_df_1 %>% mutate(ID1 = row.names(.))
my_df_2 <- my_df_2 %>% mutate(ID2 = row.names(.))
# Create spatial point data frame
my_df_1_sp <- my_df_1
coordinates(my_df_1_sp) <- ~START_LONG + START_LAT
my_df_2_sp <- my_df_2
coordinates(my_df_2_sp) <- ~longitude + latitude
# Convert to simple feature
my_df_1_sf <- st_as_sf(my_df_1_sp)
my_df_2_sf <- st_as_sf(my_df_2_sp)
# Set projection based on the epsg code
st_crs(my_df_1_sf) <- 4326
st_crs(my_df_2_sf) <- 4326
# Calculate the distance
m_dist <- st_distance(my_df_1_sf, my_df_2_sf)
# Filter for the nearest
near_index <- apply(m_dist, 1, order)[1, ]
# Based on the index in near_index to select the rows in my_df_2
# Combine with my_df_1
my_df_final <- cbind(my_df_1, my_df_2[near_index, ])