I'm scraping data from a large online database (GBIF), which requires three steps: (1) match a GBIF "key" identifier to a species name, (2) send a query to the database, getting a download key ("res") in return, and (3) download, import, and filter the data associated with that species. I've written a function for each of these (not including the actual code here, since it's unfortunately very long and requires login credentials):
get_gbif_key <- function(species) {}
get_gbif_res <- function(gbifkey) {}
get_gbif_dat <- function(gbifres) {}
I have a list of several hundred species to which I want to apply these three functions in order. I know they work individually, but I can't figure out how to feed them into each other (probably using purrr
?) and reference the correct inputs from the nested outputs of the previous function.
So, for example:
> testlist <- c('Gadus morhua','Caretta caretta')
> testkey <- map(testlist, get_gbif_key)
> testkey
[[1]]
[1] 8084280
[[2]]
[1] 8894817
Here's where I'm stuck. I want to feed the keys in this list structure into the next function, but I don't know how to properly reference them using map
or other functions. I can do it by manually creating a new list for the next function:
> testlist2 <- c('8084280','8894817')
> testres <- map(testlist2, get_gbif_res)
> testres
[[1]]
<<gbif download>>
Username: XXXX
E-mail: XXXX@gmail.com
Download key: 0001342-180412121330197
[[2]]
<<gbif download>>
Username: XXXX
E-mail: XXXX@gmail.com
Download key: 0001343-180412121330197
EDIT: the structure of this output may be posing a problem here. When I run listviewer::jsonedit(testres)
, it just looks like a normal nested list with entries 0 and 1 holding the two download keys. However, when I run str(testres)
, I get the following:
> str(testres)
List of 2
$ :Class 'occ_download' atomic [1:1] 0001342-180412121330197
.. ..- attr(*, "user")= chr "XXXX"
.. ..- attr(*, "email")= chr "XXXX@gmail.com"
$ :Class 'occ_download' atomic [1:1] 0001343-180412121330197
.. ..- attr(*, "user")= chr "XXXX"
.. ..- attr(*, "email")= chr "XXXX@gmail.com"
And, again, for the third one:
> testlist3 <- c('0001342-180412121330197','0001343-180412121330197')
> testdat <- map(testlist3, get_gbif_dat)
Which successfully loads a list object with the desired data into R (it has two unnamed elements, 0 and 1, each of which is a list of 28 requested variables for each species). Any advice for scripting this get_gbif_key %>% get_gbif_res %>% get_gbif_dat
workflow in a way that unpacks the preceding list structures correctly?
Here's what you should try based on the evidence provided so far. Basically, the results suggest you should be able to succeed with nested map
-ping:
yourData <- map( unlist( # to make same class as your single func version
map(
map(testlist,
get_gbif_key), # returns gbifkeys
get_gbif_res)), # returns gbif_res's
get_gbif_dat) # returns data items
The last item that you showed the structure for is just a list of atomic character vectors with some extra attributes and your functions seems to handle that without difficulty, so mapping should succeed.