so this should be a relatively easy question on pulling items in a list into a dataframe, but I'm stuck on something.
I have the following list (I'm showing just part of the list for you, it's far longer than this):
str(raw_jobs_list)
List of 2
$ :List of 4
..$ id : chr "3594134"
..$ score : int 1
..$ fields:List of 16
.. ..$ date :List of 3
.. .. ..$ changed: chr "2020-04-18T00:35:00+00:00"
.. .. ..$ created: chr "2020-04-07T11:15:37+00:00"
.. .. ..$ closing: chr "2020-04-17T00:00:00+00:00"
.. ..$ country :List of 1
.. .. ..$ :List of 6
.. .. .. ..$ href : chr "https://api.reliefweb.int/v1/countries/149"
.. .. .. ..$ name : chr "Mali"
.. .. .. ..$ location :List of 2
.. .. .. .. ..$ lon: num -1.25
.. .. .. .. ..$ lat: num 17.4
.. .. .. ..$ id : int 149
.. .. .. ..$ shortname: chr "Mali"
.. .. .. ..$ iso3 : chr "mli"
.. ..$ title : chr "REGIONAL MANAGER West Africa"
I tried pulling them out using:
jobs_data_df <- list.stack(list.select(raw_jobs_list,
fields$title,
fields$country$name,
fields$date$created))
Where raw_jobs_list is the list, but I get these NAs and am not sure how to get past it.
glimpse(jobs_data_df)
Rows: 2
Columns: 3
$ V1 <chr> "REGIONAL MANAGER West Africa", "Support Relief Group Public Health Advisor (Multiple Positions)"
$ V2 <lgl> NA, NA
$ V3 <chr> "2020-04-07T11:15:37+00:00", "2020-05-04T15:20:37+00:00"
It's possible there's something obvious I'm overlooking as I haven't worked much with lists before. Any ideas?
Thanks so much! C
PS. If you're interested, I'm working with this API and this is how I got there so far.
jobs <- GET(url = "https://api.reliefweb.int/v1/jobs?appname=apidoc&preset=analysis&profile=full&limit=2")
raw_jobs_list <- content(jobs)$data
The portion displayed above is a subset of the whole data; here is a portion of the first element of the list:
dput(lapply(raw_jobs_list, function(x) c(x[c("id","score")], list(fields=x[[3]][intersect(names(x[[3]]),c("date","country","title"))]))))
list(list(id = "3594134", score = 1L, fields = list(date = list(
changed = "2020-04-18T00:35:00+00:00", created = "2020-04-07T11:15:37+00:00",
closing = "2020-04-17T00:00:00+00:00"), country = list(list(
href = "https://api.reliefweb.int/v1/countries/149", name = "Mali",
location = list(lon = -1.25, lat = 17.35), id = 149L, shortname = "Mali",
iso3 = "mli")), title = "REGIONAL MANAGER West Africa")),
list(id = "3594129", score = 1L, fields = list(date = list(
changed = "2020-05-19T00:04:01+00:00", created = "2020-05-04T15:20:37+00:00",
closing = "2020-05-18T00:00:00+00:00"), title = "Support Relief Group Public Health Advisor (Multiple Positions)")))
If you look at just one element at a time, I think that as.data.frame
does a pretty decent job. While I'll demonstrate using the abbreviated data (that I edited into your question), and the first element looks like:
raw_jobs_sublist <- lapply(raw_jobs_list, function(x) c(x[c("id","score")], list(fields=x[[3]][intersect(names(x[[3]]),c("date","country","title"))])))
as.data.frame(raw_jobs_sublist[[1]])
# id score fields.date.changed fields.date.created fields.date.closing fields.country.href fields.country.name fields.country.location.lon fields.country.location.lat fields.country.id fields.country.shortname fields.country.iso3 fields.title
# 1 3594134 1 2020-04-18T00:35:00+00:00 2020-04-07T11:15:37+00:00 2020-04-17T00:00:00+00:00 https://api.reliefweb.int/v1/countries/149 Mali -1.25 17.35 149 Mali mli REGIONAL MANAGER West Africa
Shown differently (just for variety here), it's
str(as.data.frame(raw_jobs_sublist[[1]]))
# 'data.frame': 1 obs. of 13 variables:
# $ id : chr "3594134"
# $ score : int 1
# $ fields.date.changed : chr "2020-04-18T00:35:00+00:00"
# $ fields.date.created : chr "2020-04-07T11:15:37+00:00"
# $ fields.date.closing : chr "2020-04-17T00:00:00+00:00"
# $ fields.country.href : chr "https://api.reliefweb.int/v1/countries/149"
# $ fields.country.name : chr "Mali"
# $ fields.country.location.lon: num -1.25
# $ fields.country.location.lat: num 17.4
# $ fields.country.id : int 149
# $ fields.country.shortname : chr "Mali"
# $ fields.country.iso3 : chr "mli"
# $ fields.title : chr "REGIONAL MANAGER West Africa"
In order to do this on all elements, we need to account for a few things:
Here's a first stab:
dplyr::bind_rows(lapply(raw_jobs_sublist, as.data.frame))
# id score fields.date.changed fields.date.created fields.date.closing fields.country.href fields.country.name fields.country.location.lon fields.country.location.lat fields.country.id fields.country.shortname fields.country.iso3 fields.title
# 1 3594134 1 2020-04-18T00:35:00+00:00 2020-04-07T11:15:37+00:00 2020-04-17T00:00:00+00:00 https://api.reliefweb.int/v1/countries/149 Mali -1.25 17.35 149 Mali mli REGIONAL MANAGER West Africa
# 2 3594129 1 2020-05-19T00:04:01+00:00 2020-05-04T15:20:37+00:00 2020-05-18T00:00:00+00:00 <NA> <NA> NA NA NA <NA> <NA> Support Relief Group Public Health Advisor (Multiple Positions)
This also works with data.table::rbindlist
. It does not work as well with do.call(rbind.data.frame, ...)
, since that is less tolerant of missing names. (This it can be done without too much trouble, there are occasionally other advantages to using these two options.)
Note: if you do this on the original data, R's default mechanism of displaying a data.frame
will cramp your console with all of the text, which might be annoying. If you are already using dplyr
or data.table
in any of your work, both of those formats provide string-limiting, so that it is more tolerable on the console. For example, showing the whole thing:
tibble::tibble(dplyr::bind_rows(lapply(raw_jobs_list, as.data.frame)))
# # A tibble: 2 x 42
# id score fields.date.cha~ fields.date.cre~ fields.date.clo~ fields.country.~ fields.country.~ fields.country.~ fields.country.~ fields.country.~ fields.country.~ fields.country.~ fields.career_c~ fields.career_c~ fields.name fields.source.h~ fields.source.n~ fields.source.id fields.source.t~ fields.source.t~ fields.source.s~ fields.source.h~ fields.title fields.body
# <chr> <int> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <int> <chr> <chr> <chr> <int> <chr> <chr> <chr> <int> <chr> <int> <chr> <chr> <chr> <chr>
# 1 3594~ 1 2020-04-18T00:3~ 2020-04-07T11:1~ 2020-04-17T00:0~ https://api.rel~ Mali -1.25 17.4 149 Mali mli Donor Relations~ 20966 Bamako https://api.rel~ ICCO COOPERATION 45059 Non-governmenta~ 274 ICCO COOPERATION https://www.icc~ REGIONAL MA~ "**VACANCY~
# 2 3594~ 1 2020-05-19T00:0~ 2020-05-04T15:2~ 2020-05-18T00:0~ <NA> <NA> NA NA NA <NA> <NA> Program/Project~ 6867 <NA> https://api.rel~ US Agency for I~ 1751 Government 271 USAID http://www.usai~ Support Rel~ "### **SOL~
# # ... with 18 more variables: fields.type.name <chr>, fields.type.id <int>, fields.experience.name <chr>, fields.experience.id <int>, fields.url <chr>, fields.url_alias <chr>, fields.how_to_apply <chr>, fields.id <int>, fields.status <chr>, fields.body.html <chr>, fields.how_to_apply.html <chr>, href <chr>, fields.source.longname <chr>, fields.source.spanish_name <chr>,
# # fields.theme.name <chr>, fields.theme.id <int>, fields.theme.name.1 <chr>, fields.theme.id.1 <int>
data.table::rbindlist(lapply(raw_jobs_list, as.data.frame), fill = TRUE)
# id score fields.date.changed fields.date.created fields.date.closing fields.country.href fields.country.name fields.country.location.lon fields.country.location.lat fields.country.id fields.country.shortname fields.country.iso3 fields.career_categories.name fields.career_categories.id fields.name
# <char> <int> <char> <char> <char> <char> <char> <num> <num> <int> <char> <char> <char> <int> <char>
# 1: 3594134 1 2020-04-18T00:35:00+00:00 2020-04-07T11:15:37+00:00 2020-04-17T00:00:00+00:00 https://api.reliefweb.int/v1/countri... Mali -1.25 17.35 149 Mali mli Donor Relations/Grants Management 20966 Bamako
# 2: 3594129 1 2020-05-19T00:04:01+00:00 2020-05-04T15:20:37+00:00 2020-05-18T00:00:00+00:00 <NA> <NA> NA NA NA <NA> <NA> Program/Project Management 6867 <NA>
# 27 variables not shown: [fields.source.href <char>, fields.source.name <char>, fields.source.id <int>, fields.source.type.name <char>, fields.source.type.id <int>, fields.source.shortname <char>, fields.source.homepage <char>, fields.title <char>, fields.body <char>, fields.type.name <char>, ...]
For data.table
, I already have some options set that facilitate this. Notably, I'm currently using:
options(
datatable.prettyprint.char = 36,
datatable.print.topn = 10,
datatable.print.class = TRUE,
datatable.print.trunc.cols = TRUE
)
At this point, you have a data.frame
that should contain all of the data (and NA
for elements with missing fields). From here, if you don't like the nested-names convention (e.g., fields.date.changed
), then they can be easily renamed using patterns or conventional methods.