pythonbiopythonncbipubmed

Searching Entrez based on paper title


I am trying to search NCBI's Entrez based on a title. Here are my GET requests's URL and parameters:

import requests

url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
# SEE: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8021862/
params = {
    "tool": "foo",
    "email": "example@example.com",
    "api_key": None,
    "retmode": "json",
    "db": "pubmed",
    "retmax": "1",
    "term": '"Interpreting Genetic Variation in Human and Cancer Genomes"[Title]',
}
response = requests.get(url, params=params, timeout=15.0)
response.raise_for_status()
result = response.json()["esearchresult"]

However, I am getting no results, the result["count"] is 0. How can I search Entrez for based on a paper's title?

When answering, feel free to use requests directly, or common Entrez wrappers like biopython's Bio.Entrez or easy-entrez. I am using Python 3.11.


Solution

  • Okay, turns out nothing is ever easy.

    TL;DR use proximity search with a distance of 0.

    "term": '"Interpreting Genetic Variation in Human and Cancer Genomes"[Title:~0]'
    

    Starting at the PubMed User Guide, in the "Searching for a phrase section", it talks about PubMed's phrase index. Let's start by seeing what the phrase index contains for this search.

    Going to PubMed Advanced Search Builder, and hitting the "Show Index" button:

    screenshot of PubMed Advanced Search Builder's Show Index

    We observe that the search query was not in the phrase index. Now back in the "Searching for a phrase" section, we see:

    If you use quotes and the phrase is not found in the phrase index, the quotes are ignored and the terms are processed using automatic term mapping.

    Okay, so it seems automatic term mapping (ATM) is failing us as well. Let's keep reading in the "Quoted phrase not found" section:

    To search for a phrase that is not found in the phrase index, use a proximity search with a distance of 0 (...); this will search for the quoted terms appearing next to each other, in any order.

    Now, trying that proximity search with 0 distance:

    import requests
    
    url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
    # SEE: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8021862/
    params = {
        "tool": "foo",
        "email": "example@example.com",
        "api_key": None,
        "retmode": "json",
        "db": "pubmed",
        "retmax": "1",
        "term": '"Interpreting Genetic Variation in Human and Cancer Genomes"[Title:~0]',
    }
    response = requests.get(url, params=params, timeout=15.0)
    response.raise_for_status()
    result = response.json()["esearchresult"]
    print(result["count"])  # Prints: 1
    print(result["idlist"][0])  # Prints: 33834021
    

    It works! Case closed.

    Notes:

    1. Searching for a whole title in exact ordering (via double quotes) doesn't work, because the PubMed doesn't index full titles into the phrase index.
    2. Using a zero distance proximity search has a downside: it doesn't enforce exact term ordering. However, it's a viable workaround for point 1.