tl;dr: What is different about an esummary list produced by rentrez
, and why do said lists stop working with other rentrez
functions after they are merged using append()
?
I am accessing Pubmed using rentrez
. I am able to search for publications and download esummaries without problem. However, there must be something special about an esummary list that I do not understand, because things fall apart when I used append()
to try to merge lists. I have not been able to figure out what that difference is by reading the documentation. Here is the code that allows me to search Pubmed and download records without problem:
# set search term and retmax
term_set <- '"Transcription, Genetic"[Mesh] AND "Regulatory Sequences, Nucleic Acid"[Mesh] AND 2017:2018[PDAT]'
retmax_set <- 500
# search pubmed using web history
search.l <- entrez_search(db = "pubmed", term = term_set, use_history = T)
# get summaries of search hits using web history
for (seq_start in seq(0, search.l$count, retmax_set)) {
if (seq_start == 0) {summary.l <- list()}
summary.l[[length(summary.l)+1]] <- entrez_summary(
db = "pubmed",
web_history = search.l$web_history,
retmax = retmax_set,
retstart = seq_start
)
}
However, using summary.l <- list()
and then summary.l[[length(summary.l)+1]] <- entrez_summary(...
results in a list of lists of esummaries (3 sub-lists, in this search). This results in multiple for
loops in subsequent steps of the data extraction (below) and is an unweildly data structure.
# extract desired information from esummary, convert to dataframe
for (i in 1:length(summary.l)) {
if (i == 1) {faut.laut.l <- list()}
faut.laut <- summary.l[[i]] %>%
extract_from_esummary(
c("uid", "sortfirstauthor", "lastauthor"),
simplify = F
)
faut.laut.l <- c(faut.laut.l, faut.laut)
}
faut.laut.df <- rbindlist(faut.laut.l)
Using append()
in the code below gives a single list of all 1334 esummaries, avoiding the sub-lists.
# get summaries of search hits using web history
for (seq_start in seq(0, search.l$count, retmax_set)) {
if (seq_start == 0) {
summary.append.l <- entrez_summary(
db = "pubmed",
web_history = search.l$web_history,
retmax = retmax_set,
retstart = seq_start
)
}
summary.append.l <- append(
summary.append.l,
entrez_summary(
db = "pubmed",
web_history = search.l$web_history,
retmax = retmax_set,
retstart = seq_start
)
)
}
However, in the subsequent step extract_from_esummary()
throws an error, even though the documentation says states that the argument esummaries
should be a list of esummary objects.
# extract desired information from esummary, convert to dataframe
faut.laut.append.l <- extract_from_esummary(
esummaries = summary.append.l,
elements = c("uid", "sortfirstauthor", "lastauthor"),
simplify = F
)
Error in UseMethod("extract_from_esummary", esummaries) :
no applicable method for 'extract_from_esummary' applied to an object of class "list"
faut.laut.append.df <- rbindlist(faut.laut.append.l)
Error in rbindlist(faut.laut.append.l) :
object 'faut.laut.append.l' not found
A search that yeilds less than 500 records can be done in a single call of entrez_summary()
and does not require the concatenation of lists. As a result, the code below works.
# set search term and retmax
term_set_small <- 'kadonaga[AUTH]'
retmax_set <- 500
# search pubmed using web history
search_small <- entrez_search(db = "pubmed", term = term_set_small, use_history = T)
# get summaries from search with <500 hits
summary_small <- entrez_summary(
db = "pubmed",
web_history = search_small$web_history,
retmax = retmax_set
)
# extract desired information from esummary, convert to dataframe
faut.laut_small <- extract_from_esummary(
esummaries = summary_small,
elements = c("uid", "sortfirstauthor", "lastauthor"),
simplify = F
)
faut.laut_small.df <- rbindlist(faut.laut_small)
Why does append()
break the esummaries, and can this be avoided? Thanks.
The documentation for extract_from_esummary
is a little confusing on this. What it really needs is either an esummary
object or an esummary_list
. Because the esummary
object itself inherits from a list I don't think we can easily have extract_from_esummary
work on any list that is thrown at it. I'll fix the docs and maybe think about a better design for the objects.
To fix this particular problem there are a few fixes. One, you can just re-class the list of esummaries
class(summary.append.l) <- c("list", "esummary_list")
extract_from_esummary(summary.append.l, "sortfirstauthor")
Should do the trick. Another option would be to extract the relevant data before you do any appending. This is something simlar to your example with more lapply
and less for
all_the_summs <- lapply(seq(0,50,5), function(s) {
entrez_summary(db="pubmed",
web_history=search.l$web_history,
retmax=5, retstart=s)
})
desired_fields <- lapply(all_the_summs, extract_from_esummary, c("uid", "sortfirstauthor", "lastauthor"), simplify=FALSE)
res <- do.call(cbind.data.frame, desired_fields)
Hope that provides a way forward.