I am fed up with Google's geocoding, and decided to try an alternative. The Data Science Toolkit (http://www.datasciencetoolkit.org) allows you to Geocode unlimited number of addresses. R has an excellent package that serves as a wrapper for its functions (CRAN:RDSTK). The package has a function called street2coordinates()
that interfaces with the Data Science Toolkit's geocoding utility.
However, the RDSTK function street2coordinates()
does not work if you try to geocode something simple like City, Country. In the following example I will try to use the function to get the latitude and longitude for the city of Phoenix:
> require("RDSTK")
> street2coordinates("Phoenix+Arizona+United+States")
[1] full.address
<0 rows> (or 0-length row.names)
The utility from the data science toolkit works perfectly. This is the URL request that gives the answer: http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address=Phoenix+Arizona+United+States
I am interested in geocoding multiple addresses (which complete addresses and city names). I know that the Data Science Toolkit URL will work well.
How do I interface with the URL and get multiple latitudes and longitudes into a data frame with the addresses?
Here is an sample dataset:
dff <- data.frame(address=c(
"Birmingham, Alabama, United States",
"Mobile, Alabama, United States",
"Phoenix, Arizona, United States",
"Tucson, Arizona, United States",
"Little Rock, Arkansas, United States",
"Berkeley, California, United States",
"Duarte, California, United States",
"Encinitas, California, United States",
"La Jolla, California, United States",
"Los Angeles, California, United States",
"Orange, California, United States",
"Redwood City, California, United States",
"Sacramento, California, United States",
"San Francisco, California, United States",
"Stanford, California, United States",
"Hartford, Connecticut, United States",
"New Haven, Connecticut, United States"
))
Like this:
library(httr)
library(rjson)
data <- paste0("[",paste(paste0("\"",dff$address,"\""),collapse=","),"]")
url <- "http://www.datasciencetoolkit.org/street2coordinates"
response <- POST(url,body=data)
json <- fromJSON(content(response,type="text"))
geocode <- do.call(rbind,sapply(json,
function(x) c(long=x$longitude,lat=x$latitude)))
geocode
# long lat
# San Francisco, California, United States -117.88536 35.18713
# Mobile, Alabama, United States -88.10318 30.70114
# La Jolla, California, United States -117.87645 33.85751
# Duarte, California, United States -118.29866 33.78659
# Little Rock, Arkansas, United States -91.20736 33.60892
# Tucson, Arizona, United States -110.97087 32.21798
# Redwood City, California, United States -117.88536 35.18713
# New Haven, Connecticut, United States -72.92751 41.36571
# Berkeley, California, United States -122.29673 37.86058
# Hartford, Connecticut, United States -72.76356 41.78516
# Sacramento, California, United States -121.55541 38.38046
# Encinitas, California, United States -116.84605 33.01693
# Birmingham, Alabama, United States -86.80190 33.45641
# Stanford, California, United States -122.16750 37.42509
# Orange, California, United States -117.85311 33.78780
# Los Angeles, California, United States -117.88536 35.18713
This takes advantage of the POST interface to the street2coordinates API (documented here), which returns all the results in 1 request, rather than using multiple GET requests.
The absence of Phoenix seems to be a bug in the street2coordinates API. If you go the API demo page and try "Phoenix, Arizona, United States", you get a null response. However, as your example shows, using their "Google-style Geocoder" does give a result for Phoenix. So here's a solution using repeated GET requests. Note that this runs much slower.
geo.dsk <- function(addr){ # single address geocode with data sciences toolkit
require(httr)
require(rjson)
url <- "http://www.datasciencetoolkit.org/maps/api/geocode/json"
response <- GET(url,query=list(sensor="FALSE",address=addr))
json <- fromJSON(content(response,type="text"))
loc <- json['results'][[1]][[1]]$geometry$location
return(c(address=addr,long=loc$lng, lat= loc$lat))
}
result <- do.call(rbind,lapply(as.character(dff$address),geo.dsk))
result <- data.frame(result)
result
# address long lat
# 1 Birmingham, Alabama, United States -86.801904 33.456412
# 2 Mobile, Alabama, United States -88.103184 30.701142
# 3 Phoenix, Arizona, United States -112.0733333 33.4483333
# 4 Tucson, Arizona, United States -110.970869 32.217975
# 5 Little Rock, Arkansas, United States -91.207356 33.608922
# 6 Berkeley, California, United States -122.29673 37.860576
# 7 Duarte, California, United States -118.298662 33.786594
# 8 Encinitas, California, United States -116.846046 33.016928
# 9 La Jolla, California, United States -117.876447 33.857515
# 10 Los Angeles, California, United States -117.885359 35.187133
# 11 Orange, California, United States -117.853112 33.787795
# 12 Redwood City, California, United States -117.885359 35.187133
# 13 Sacramento, California, United States -121.555406 38.380456
# 14 San Francisco, California, United States -117.885359 35.187133
# 15 Stanford, California, United States -122.1675 37.42509
# 16 Hartford, Connecticut, United States -72.763564 41.78516
# 17 New Haven, Connecticut, United States -72.927507 41.365709