ropenstreetmapjsonlitenominatim

Troubleshooting API coordinate request loop


I want to find the coordinates for a list of addresses.

I am using a data set that can be found here: "https://www.data.gv.at/katalog/dataset/kaufpreissammlung-liegenschaften-wien"

I've inputed this using the read_csv function as "data". I'm using the tidyverse and jsonlite libraries. The only relevant columns are "Straße" which is the street name and "ON" which is the street number. The city for all of these is Vienna, Austria.

I'm using OpenStreetMap and have formatted my address data like the format requires:

data$formatted_address <- paste(ifelse(is.na(data$ON), "", data$ON), "+", tolower(data$Straße), ",+vienna", sep = "")

This formats the adresses in this column as 1+milanweg,+vienna and 12+granergasse,+vienna. When I manually input this into the API format, it all works out and I get the coordinates: https://nominatim.openstreetmap.org/search?q=1+milanweg,+vienna&format=json&polygon=1&addressdetails=1

Since I now want to do this for my entire row, I am using jsonlite to create requests in R.

data$coordinates <- data.frame(lat = NA, lon = NA)
for (i in 1:nrow(data)) {
  result <- try(readLines(paste0("https://nominatim.openstreetmap.org/search?q=", 
                                 URLencode(data$formatted_address[i]), "&format=json&polygon=1&addressdetails=1")), 
                silent = TRUE)
  if (!inherits(result, "try-error")) {
    if (length(result) > 0) {
      result <- fromJSON(result)
      if (length(result) > 0 && is.list(result[[1]])) {
        data$coordinates[i, ] <- c(result[[1]]$lat, result[[1]]$lon)
      }
    }
  }
}

This should theoretically create the exact same API request, however, the lat and lon columns are always empty.

How can I fix this script to create a list of coordinates for each address in the data set?


Solution

  • Data setup

    library(tidyverse) 
    library(httr2) 
    
    df <- df %>% 
      mutate(
        formatted_address = str_c(
          if_else(is.na(on), "", on), "+", str_to_lower(strasse), "+vienna"
        ) %>% str_remove_all(" ")
      )
    
    # A tibble: 57,912 × 7
       kg_code katastralgemeinde      ez   plz strasse                   on      formatted_address                
         <dbl> <chr>               <dbl> <dbl> <chr>                     <chr>   <chr>                            
     1    1617 Strebersdorf         1417  1210 Mühlweg                   13      13+mühlweg+vienna                
     2    1607 Groß Jedlersdorf II   193  1210 Bahnsteggasse             4       4+bahnsteggasse+vienna           
     3    1209 Ober St.Veit         3570  1130 Jennerplatz               34/20   34/20+jennerplatz+vienna         
     4    1207 Lainz                 405  1130 Sebastian-Brunner-Gasse   6       6+sebastian-brunner-gasse+vienna 
     5    1101 Favoriten            3831  1100 Laxenburger Straße        2C -2 D 2C-2D+laxenburgerstraße+vienna   
     6    1101 Favoriten            3827  1100 Laxenburger Straße        2 C     2C+laxenburgerstraße+vienna      
     7    1101 Favoriten            3836  1100 hinter Laxenburger Straße 2 C     2C+hinterlaxenburgerstraße+vienna
     8    1201 Auhof                 932  1130 Keplingergasse            10      10+keplingergasse+vienna         
     9    1213 Speising              135  1130 Speisinger Straße         29      29+speisingerstraße+vienna       
    10    1107 Simmering            2357  1100 BATTIGGASSE               44      44+battiggasse+vienna            
    # … with 57,902 more rows
    # ℹ Use `print(n = ...)` to see more rows
    

    API call and getting coordinates. I gathered the display name matched by the API, and the lat & lon data.

    get_coords <- function(address) {
      cat("Getting coordinates", address, "\n")
      str_c(
        "https://nominatim.openstreetmap.org/search?q=",
        address,
        "&format=json&polygon=1&addressdetails=1"
      ) %>%
        request() %>%
        req_perform() %>%
        resp_body_json(simplifyVector = TRUE) %>%
        as_tibble() %>%
        select(api_name = display_name,
               lat, lon) %>% 
        slice(1)
    }
    
    df %>% 
      slice_sample(n = 10) %>% 
      mutate(coordinates = map(
        formatted_address, possibly(get_coords, tibble(
          api_name = NA_character_, 
          lat = NA_character_, 
          lon = NA_character_
        ))
      )) %>% 
      unnest(coordinates) 
    
    # A tibble: 10 × 10
       kg_code katastralgemeinde      ez   plz strasse               on    formatted_…¹ api_n…² lat   lon  
         <dbl> <chr>               <dbl> <dbl> <chr>                 <chr> <chr>        <chr>   <chr> <chr>
     1    1651 Aspern               3374  1220 ERLENWEG              8     8+erlenweg+… 8, Erl… 48.2… 16.4…
     2    1613 Leopoldau            6617  1210 Oswald-Redlich-Straße 31    31+oswald-r… 31, Os… 48.2… 16.4…
     3    1006 Landstraße           2425  1030 HAGENMÜLLERGASSE      45018 45018+hagen… Hagenm… 48.1… 16.4…
     4    1101 Favoriten             541  1100 HERNDLGASSE           7     7+herndlgas… 7, Her… 48.1… 16.3…
     5    1607 Groß Jedlersdorf II   221  1210 Prager Straße         70    70+pragerst… Prager… 48.2… 16.3…
     6    1006 Landstraße           1184  1030 PAULUSGASSE           2     2+paulusgas… 2, Pau… 48.1… 16.3…
     7    1654 Eßling               2712  1220 KAUDERSSTRASSE        61    61+kauderss… 61, Ka… 48.2… 16.5…
     8    1401 Dornbach             2476  1170 Alszeile              NA    +alszeile+v… Alszei… 48.2… 16.2…
     9    1654 Eßling                745  1220 Kirschenallee         19    19+kirschen… 19, Ki… 48.2… 16.5…
    10    1204 Hadersdorf           3139  1140 MITTLERE STRASSE      NA    +mittlerest… Mittle… 48.2… 16.1…
    # … with abbreviated variable names ¹​formatted_address, ²​api_name