I want to merge city names to approximate coordinates.
I have two datasets.
cities
.events
.Most of the events occur just out-side the lat-longs of the city.
I want to merge in the city
from cities
if the lat-long are max 1 lat
and lon
different from those listed in events
.
The nearest
function in data.table
seems to be too crude.
What would you do? Use maptools
?
Example:
cities <- data.table(city = c("A", "B", "C"),
lat = c(23.4, 43.5, 21.3),
lon = c(100, 98.4, -78.2))
events <- data.table(event = c("X1", "Y1", "B1"),
lat = c(24.4, 42.5, 23.3),
lon = c(101, 100.4, -78.2)))
result <- data.table(event = c("X1", "Y1", "B1"),
lat = c(23.4, 43.5, 21.3),
lon = c(100, 98.4, -78.2),
city = c("A", NA, NA))
> result
event lat lon city
1: X1 23.4 100.0 A
2: Y1 43.5 98.4 <NA>
3: B1 21.3 -78.2 <NA>
This non-equi update join do the trick... But this only will work since you put on a hard 1-degree limit. Problem is dat the distance bewteen 2 degrees will vary around the globe...
events[ cities[, `:=`(lat_min = lat - 1, lat_max = lat+1,
lon_min = lon - 1, lon_max = lon + 1) ],
city := i.city,
on = .(lat >= lat_min, lat <= lat_max, lon >= lon_min, lon <= lon_max ) ][]
# event lat lon city
# 1: X1 24.4 101.0 A
# 2: Y1 42.5 100.4 <NA>
# 3: B1 23.3 -78.2 <NA>
If you want to set a maximum distance bwetween events and cities, you'll need a spatial solution like this:
#maximum distance between event and city (in metres)
max_dist = 180000
library( sf )
#create simple (point) features of events and cities
cities.sf <- st_as_sf( cities, coords = c("lon", "lat"), crs = 4326 )
events.sf <- st_as_sf( events, coords = c("lon", "lat"), crs = 4326 )
#spatial join
st_join( events.sf, cities.sf, join = st_is_within_distance, dist = max_dist )
# Simple feature collection with 3 features and 2 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -78.2 ymin: 23.3 xmax: 101 ymax: 42.5
# CRS: EPSG:4326
# event city geometry
# 1 X1 A POINT (101 24.4)
# 2 Y1 <NA> POINT (100.4 42.5)
# 3 B1 <NA> POINT (-78.2 23.3)