rdata.tablegeolocationgoogle-location-servicesgeonames

Use API function to populate data table


I'm using the geonames package to request country names, which I can do manually, but I don't understand how to make the API call for each row in my table.


loc2$country = GNcountryCode(loc2$lon, loc2$lat)$countryCode

My intention is to create a new column "country" populated with the corresponding code, but it appears to simple concatenate all the latitudes & longitudes into a single request.

I apologise for the basic nature of the question. I have no experience with R at all. I can't figure out how to make the function call work.

Here's row 1:

loc2[1,]
  latitudeE7 longitudeE7 accuracy                                                                           activity source  deviceTag
1  375800672  1268884670       22 ON_BICYCLE, ON_FOOT, IN_VEHICLE, UNKNOWN, 34, 30, 21, 13, 2014-01-24T10:12:51.748Z   WIFI 1521681206
                 timestamp velocity altitude verticalAccuracy platformType serverTimestamp deviceTimestamp batteryCharging formFactor heading
1 2014-01-24T10:12:50.011Z       NA       NA               NA         <NA>            <NA>            <NA>              NA       <NA>      NA
  deviceDesignation      lat      lon        day
1              <NA> 37.58007 126.8885 2014-01-24

Background:

Officialdom requires me to identify which countries I've visited, and duration, over the last 10 years. I travel a lot, often by different modes there/back/on to another country, even on foot or by bicycle, so I don't have comprehensive formal documentation (like tickets) that contain this information.

I've never used R but after a bit of reading I thought it would be simplest to analyse my Google Location History (although I often keep airplane mode enabled to prolong battery life, so even that's not comprehensive, but a start...)

I have a data table with the JSON data downloaded, and have reduced the number of rows by a factor of 500x by selecting only unique days. The geonames site allows 1000 calls per hour.

Yes, I know, some (sensible) people will ask, if even I have no idea where I've been, why would I need to compile this data? I could confabulate a plausible fiction but this has become an obsession in itself. I haven't done any computer work at all for over 10 years so I'm struggling a bit.


Solution

  • Geonames countrycode API does not support batch requests so you can only include a single coordinate pair in each call. You could handle this through mapply() -- define a function that takes 2 arguments (lat, lon) and extracts countryCode from the response, use it as a first argument for mapply(); pass lat & lon vectors as 2nd and 3rd argument, mapply will cycle through each lat-lon pair, calls the function and returns a vector with results:

    library(geonames)
    
    # example locations:
    loc2
    #>          lon       lat
    #> 1  -84.41688  77.88553
    #> 2  -46.03540 -14.01990
    #> 3  146.95480  59.73224
    #> 4 -116.43957  47.22695
    #> 5   60.64802  26.29448
    
    loc2$country_gn <- 
      withr::with_options(
        list(geonamesUsername=YOUR_GEONAMES_USERNAME),
        mapply(\(lat, lon) GNcountryCode(lat, lon)$countryCode, loc2$lat, loc2$lon)
      )
    loc2
    #>          lon       lat country_gn
    #> 1  -84.41688  77.88553         CA
    #> 2  -46.03540 -14.01990         BR
    #> 3  146.95480  59.73224         RU
    #> 4 -116.43957  47.22695         US
    #> 5   60.64802  26.29448         IR
    

    Though you could just handle this without any external API: fetch a dataset of country polygons (e.g. through giscoR for CISCO datasets or rnaturalearth for https://www.naturalearthdata.com/ data) and use a spatial join provided by sf package to find matches for your point locations:

    library(sf)
    library(giscoR)
    
    # CNTR_RG_20M_2016_4326 dataset
    world <- gisco_countries
    
    # for high(er) resolution dataset from 2024:
    # world <- gisco_get_countries(year = "2024", resolution = "01")
    
    # convert loc2 to a spatial data frame;
    # spatial join with world[, "CNTR_ID"] to match each loc2 location to a country polygon;
    # extract CNTR_ID column;
    loc2$country_cisco <- 
      st_join(
        st_as_sf(loc2, coords = c("lon", "lat"), crs = "WGS84"),
        world[, "CNTR_ID"]
      )$CNTR_ID
    
    loc2
    #>          lon       lat country_gn country_cisco
    #> 1  -84.41688  77.88553         CA            CA
    #> 2  -46.03540 -14.01990         BR            BR
    #> 3  146.95480  59.73224         RU            RU
    #> 4 -116.43957  47.22695         US            US
    #> 5   60.64802  26.29448         IR            IR
    

    Note that different geospatial datasets might take a different approach when it comes to labelling some areas, something to consider when you have to deal with location like Crimea or Northern Cyprus. This applies to reverse geocoding APIs as well.


    Example locations:

    set.seed(1)
    loc2 <- 
      sf::st_sample(giscoR::gisco_countries, 5) |> 
      sf::st_coordinates() |> 
      `colnames<-`(c("lon", "lat")) |> 
      as.data.frame()
      
    loc2 <- structure(list(lon = c(-84.4168839239306, -46.0353998519945, 
    146.954795316042, -116.439570855129, 60.6480190645839), lat = c(77.8855269367845, 
    -14.0199015626225, 59.7322442944768, 47.226945417709, 26.2944803596838
    )), class = "data.frame", row.names = c(NA, -5L))
    

    Created on 2024-10-09 with reprex v2.1.1