I am trying to search NCBI's Entrez based on a title. Here are my GET requests's URL and parameters:
import requests
url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
# SEE: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8021862/
params = {
"tool": "foo",
"email": "example@example.com",
"api_key": None,
"retmode": "json",
"db": "pubmed",
"retmax": "1",
"term": '"Interpreting Genetic Variation in Human and Cancer Genomes"[Title]',
}
response = requests.get(url, params=params, timeout=15.0)
response.raise_for_status()
result = response.json()["esearchresult"]
However, I am getting no results, the result["count"]
is 0. How can I search Entrez for based on a paper's title?
When answering, feel free to use requests
directly, or common Entrez wrappers like biopython
's Bio.Entrez
or easy-entrez
. I am using Python 3.11.
Okay, turns out nothing is ever easy.
TL;DR use proximity search with a distance of 0.
"term": '"Interpreting Genetic Variation in Human and Cancer Genomes"[Title:~0]'
Starting at the PubMed User Guide, in the "Searching for a phrase section", it talks about PubMed's phrase index. Let's start by seeing what the phrase index contains for this search.
Going to PubMed Advanced Search Builder, and hitting the "Show Index" button:
We observe that the search query was not in the phrase index. Now back in the "Searching for a phrase" section, we see:
If you use quotes and the phrase is not found in the phrase index, the quotes are ignored and the terms are processed using automatic term mapping.
Okay, so it seems automatic term mapping (ATM) is failing us as well. Let's keep reading in the "Quoted phrase not found" section:
To search for a phrase that is not found in the phrase index, use a proximity search with a distance of 0 (...); this will search for the quoted terms appearing next to each other, in any order.
Now, trying that proximity search with 0 distance:
import requests
url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
# SEE: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8021862/
params = {
"tool": "foo",
"email": "example@example.com",
"api_key": None,
"retmode": "json",
"db": "pubmed",
"retmax": "1",
"term": '"Interpreting Genetic Variation in Human and Cancer Genomes"[Title:~0]',
}
response = requests.get(url, params=params, timeout=15.0)
response.raise_for_status()
result = response.json()["esearchresult"]
print(result["count"]) # Prints: 1
print(result["idlist"][0]) # Prints: 33834021
It works! Case closed.
Notes: