rlistfunctionpurrrropensci

linking functions with purrr and referencing nested variables


I'm scraping data from a large online database (GBIF), which requires three steps: (1) match a GBIF "key" identifier to a species name, (2) send a query to the database, getting a download key ("res") in return, and (3) download, import, and filter the data associated with that species. I've written a function for each of these (not including the actual code here, since it's unfortunately very long and requires login credentials):

get_gbif_key <- function(species) {}
get_gbif_res <- function(gbifkey) {} 
get_gbif_dat <- function(gbifres) {}

I have a list of several hundred species to which I want to apply these three functions in order. I know they work individually, but I can't figure out how to feed them into each other (probably using purrr?) and reference the correct inputs from the nested outputs of the previous function.

So, for example:

> testlist <- c('Gadus morhua','Caretta caretta')
> testkey <- map(testlist, get_gbif_key)
> testkey
[[1]]
[1] 8084280

[[2]]
[1] 8894817

Here's where I'm stuck. I want to feed the keys in this list structure into the next function, but I don't know how to properly reference them using map or other functions. I can do it by manually creating a new list for the next function:

> testlist2 <- c('8084280','8894817')
> testres <- map(testlist2, get_gbif_res)
> testres
[[1]]
<<gbif download>>
  Username: XXXX
  E-mail: XXXX@gmail.com
  Download key: 0001342-180412121330197

[[2]]
<<gbif download>>
  Username: XXXX
  E-mail: XXXX@gmail.com
  Download key: 0001343-180412121330197

EDIT: the structure of this output may be posing a problem here. When I run listviewer::jsonedit(testres), it just looks like a normal nested list with entries 0 and 1 holding the two download keys. However, when I run str(testres), I get the following:

> str(testres)
List of 2
 $ :Class 'occ_download'  atomic [1:1] 0001342-180412121330197
  .. ..- attr(*, "user")= chr "XXXX"
  .. ..- attr(*, "email")= chr "XXXX@gmail.com"
 $ :Class 'occ_download'  atomic [1:1] 0001343-180412121330197
  .. ..- attr(*, "user")= chr "XXXX"
  .. ..- attr(*, "email")= chr "XXXX@gmail.com"

And, again, for the third one:

> testlist3 <- c('0001342-180412121330197','0001343-180412121330197')
> testdat <- map(testlist3, get_gbif_dat)

Which successfully loads a list object with the desired data into R (it has two unnamed elements, 0 and 1, each of which is a list of 28 requested variables for each species). Any advice for scripting this get_gbif_key %>% get_gbif_res %>% get_gbif_dat workflow in a way that unpacks the preceding list structures correctly?


Solution

  • Here's what you should try based on the evidence provided so far. Basically, the results suggest you should be able to succeed with nested map-ping:

          yourData <- map( unlist(    # to make same class as your single func version
                          map(
                              map(testlist, 
                                  get_gbif_key), # returns gbifkeys
                              get_gbif_res)),  # returns gbif_res's
                           get_gbif_dat)      # returns data items
    

    The last item that you showed the structure for is just a list of atomic character vectors with some extra attributes and your functions seems to handle that without difficulty, so mapping should succeed.