I have two datasets that I am trying to merge together based on the latitude. The latitudes do not exactly match so it has to be closest latitude by say 30km. When I try full join, the output data frame is sequential (i.e., it spits out df then df2 rather than merging/joining them. Is there a way I can merge these two data frames into one based on how close they are in latitude?
Var1 <- c(1, 4, 6, 2, 8, 9, 2, 4, 5, 4)
Var2 <- c(1.9, 2.4, 3.7, 7.3, 2.4, 4.8, 9, 2.3, 9, 1.7)
Lat <- c(57.33, 58.21, 58.93, 59.23, 59.87, 60.29, 60.99, 61.5, 61.8, 62.5)
Lon <- c(131, 131, 131, 130.9, 130.99, 130.5, 131, 131, 131.2, 131)
df <- as.data.frame(cbind(Var1, Var2, Lat, Lon))
Var1 <- c(4, 3, 6, 8, 1, 9, 3, 6, 5, 8)
Var2 <- c(6.2, 3, 5.6, 5.1, 9.0, 2.4, 4.9, 8.2, 3.1, 6.2)
Lat <- c(57.15, 58.3, 59.0, 59.4, 60.0, 60.1, 61.0, 61.6, 62, 62.7)
Lon <- c(131, 131, 131, 130.9, 130.99, 130.5, 131, 131, 131.2, 131)
df2 <- as.data.frame(cbind(Var1, Var2, Lat, Lon))
Vic = df%>%
full_join(df2, by = c("Lat" = "Lat"))
I also tried geo_join but I get an error
merged.dfs <- geo_join(df, df2, by = NULL, method = "haversine", mode = "left", max_dist = 1)
Joining by: c("Var1", "Var2", "Lat", "Lon")
Error in FUN(X[[i]], ...) :
Trying to join on Var1, Var2, Lat, Lon; geo_join needs exactly two columns (latitude and longitude)
The fuzzyjoin
package and the geo_join
function is definitely the way to go.
You were close with your try, you just need to set up some parameters as the error mentioned.
library(fuzzyjoin)
Var1 <- c(1, 4, 6, 2, 8, 9, 2, 4, 5, 4)
Var2 <- c(1.9, 2.4, 3.7, 7.3, 2.4, 4.8, 9, 2.3, 9, 1.7)
Lat <- c(57.33, 58.21, 58.93, 59.23, 59.87, 60.29, 60.99, 61.5, 61.8, 62.5)
Lon <- c(131, 131, 131, 130.9, 130.99, 130.5, 131, 131, 131.2, 131)
df <- as.data.frame(cbind(Var1, Var2, Lat, Lon))
Var1 <- c(4, 3, 6, 8, 1, 9, 3, 6, 5, 8)
Var2 <- c(6.2, 3, 5.6, 5.1, 9.0, 2.4, 4.9, 8.2, 3.1, 6.2)
Lat <- c(57.15, 58.3, 59.0, 59.4, 60.0, 60.1, 61.0, 61.6, 62, 62.7)
Lon <- c(131, 131, 131, 130.9, 130.99, 130.5, 131, 131, 131.2, 131)
df2 <- as.data.frame(cbind(Var1, Var2, Lat, Lon))
Vic <- df%>%
geo_join(df2, by = c("Lat" = "Lat", "Lon" = "Lon"), method = "haversine", mode = "left", max_dist = 30, unit = "km" )
Look into this documentation if needed