rsortingncbipubmedrentrez

Sort pubmed searches from rentrez by relevance


I'm searching PubMed using the rentrez package in R and would like to get the results sorted by relevance. Currently they are sorted by publication date.

library(rentrez)

query = 'regression to the mean[TITL]'
entrez_search = entrez_search(db="pubmed", term=query, retmax=30)
paper_data = entrez_summary(db="pubmed", id=entrez_search$ids)
dates = extract_from_esummary(paper_data, c("pubdate"))

Solution

  • As I understand it, the "relevance" information is associated with a given search (not the record summary or complete records that might be downloaded later), and there is no score or similar saying how relevant a given search result is in the data returned by entrez search.

    On the other hand, I think think the sort=relevance argument is doing something. If you send that same search twice the IDs are in the same order:

    default_search = entrez_search(db="pubmed", term=query, retmax=30)
    default_search_again = entrez_search(db="pubmed", term=query, retmax=30)
    all(default_search$ids == default_search_again$ids)
    

    .

    [1] TRUE
    

    Whereas setting the order to relevance changes the order:

    rel_search = entrez_search(db="pubmed", term=query, retmax=30, sort="relevance")
    default_search$ids == rel_search$ids
    

    .

     [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
    [13] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
    [25] FALSE FALSE  TRUE  TRUE FALSE FALSE
    

    Later calls to the summary, fetch and link functions should maintain this order, so this might be the easiest (only?) way to keep track of the relevance information?