rgoogle-mapsopenstreetmapgeosphere

Inaccurate output when trying to calculate distance between two points in R using OSM data and the geosphere distm() function


I am trying to calculate the shortest distance between a point of interest and the nearest tram station, the coordinates are in latitude and longitude (degrees) as a POINT type object. To do this I create an empty vector (distance) which I add to my dataframe containing the spatial point data and then fill the distance column using a for-loop which calculates the distance between my previously defined point of interest and all the coordinates in the dataframe using the distm() function with the distGeo function. Then I use the min() function on the distance column to find the shortest distance.

The problem I am having is not that the code is not doing what I want it to do, but that when I use the right click measure distance function in Google Maps, their distance is up to 30% shorter than the one my code outputs. All the coordinates are in the right order, and when I type them into Google Maps, the correct points show up, but the calculated distance is still wrong.

For example, the POI is 49.49443499565662, 8.459728292754358 and the nearest tram station is at 49.49353, 8.461322. My code calculates a distance of 202.39m and Google Maps calculates a distance of 152.66m.

Does anyone know why this might be happening?

Keep in mind that I am not very experienced with R or OpenStreetMap.

Here's my code:

library(osmdata)
library(sf)
library(geosphere)

#define city
city = "Mannheim"

#define coordinates of interest point
p68 = c(49.49443499565662, 8.459728292754358)

#query for tram routes in Mannheim
mannheim_tram_routes <- getbb(city) %>% 
  opq() %>% 
  add_osm_feature(key = 'route', value = 'tram')

#translating the query into sp format
mtr <- osmdata_sf(mannheim_tram_routes)

#creating vector
distances=c(1:length(mtr$osm_points$osm_id))
#creating a dataframe from the sp format 
mtrdf <- data.frame(mtr$osm_points)
#adding the distance vector to the dataframe
mtrdf$distances = distances 
#selecting only the relevant columns for the dataframe
mtrdf <- mtrdf %>% 
  select(osm_id,name,distances,geometry)
#deleting all rows with with no names (only the stations are left from the tram route data)
mtrdf2 <- na.omit(mtrdf)

#filling the distance column with a for-loop using distm()
for(i in 1:length(mtrdf2$distances)){
  mtrdf2$distances[i] =distm(p68, c(mtrdf2$geometry[[i]][2],mtrdf2$geometry[[i]][1]),fun =distGeo)
}
#checking for the smallest distance 
min(mtrdf2$distances)

Solution

  • The order of the arguments matters, and geosphere:: functions prefer longitude first. I'm inferring that your tram is 49.49353 north latitude, 8.461322 east longitude.

    If I (incorrectly) calculate it with latitude first, I get your 202 results:

    geosphere::distGeo(c(49.49443499565662, 8.459728292754358), c(49.49353, 8.461322))
    # [1] 202.4823
    

    But if I put longitude first instead, I get closer to Google's results:

    geosphere::distGeo(c(8.459728292754358, 49.49443499565662), c(8.461322, 49.49353))
    # [1] 153.1708