I'm searching PubMed using the rentrez package in R and would like to get the results sorted by relevance. Currently they are sorted by publication date.
library(rentrez)
query = 'regression to the mean[TITL]'
entrez_search = entrez_search(db="pubmed", term=query, retmax=30)
paper_data = entrez_summary(db="pubmed", id=entrez_search$ids)
dates = extract_from_esummary(paper_data, c("pubdate"))
As I understand it, the "relevance" information is associated with a given search (not the record summary or complete records that might be downloaded later), and there is no score or similar saying how relevant a given search result is in the data returned by entrez search.
On the other hand, I think think the sort=relevance
argument is doing something. If you send that same search twice the IDs are in the same order:
default_search = entrez_search(db="pubmed", term=query, retmax=30)
default_search_again = entrez_search(db="pubmed", term=query, retmax=30)
all(default_search$ids == default_search_again$ids)
.
[1] TRUE
Whereas setting the order to relevance
changes the order:
rel_search = entrez_search(db="pubmed", term=query, retmax=30, sort="relevance")
default_search$ids == rel_search$ids
.
[1] TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
[25] FALSE FALSE TRUE TRUE FALSE FALSE
Later calls to the summary, fetch and link functions should maintain this order, so this might be the easiest (only?) way to keep track of the relevance information?