named-entity-recognitionncbipubmed-api

PubTator API doesn't return session number as expected


Currently trying to validate that PubTator's API for Named Entity Recognition (NER) works and returns expected output format. I downloaded the example files included in the sample Python code at https://www.ncbi.nlm.nih.gov/research/pubtator/api.html. Using one of these example input files , I tried the following curl command as indicated in their documentation:

curl -X POST --data-binary @input/ex2.PubTator https://www.ncbi.nlm.nih.gov/research/pubtator-api/annotations/annotate/submit/Gene

I just get html code for an error-500 webpage. input/ex.PubTator is simply a single abstract, and it doesn't contain any special characters that I can discern:

20085714|t|Autosomal-dominant striatal degeneration is caused by a mutation in the phosphodiesterase 8B gene.
20085714|a|Autosomal-dominant striatal degeneration (ADSD) is an autosomal-dominant movement disorder affecting the striatal part of the basal ganglia. ADSD is characterized by bradykinesia, dysarthria, and muscle rigidity. These symptoms resemble idiopathic Parkinson disease, but tremor is not present. Using genetic linkage analysis, we have mapped the causative genetic defect to a 3.25 megabase candidate region on chromosome 5q13.3-q14.1. A maximum LOD score of 4.1 (Theta = 0) was obtained at marker D5S1962. Here we show that ADSD is caused by a complex frameshift mutation (c.94G>C+c.95delT) in the phosphodiesterase 8B (PDE8B) gene, which results in a loss of enzymatic phosphodiesterase activity. We found that PDE8B is highly expressed in the brain, especially in the putamen, which is affected by ADSD. PDE8B degrades cyclic AMP, a second messenger implied in dopamine signaling. Dopamine is one of the main neurotransmitters involved in movement control and is deficient in Parkinson disease. We believe that the functional analysis of PDE8B will help to further elucidate the pathomechanism of ADSD as well as contribute to a better understanding of movement disorders.

What am I doing wrong?


Solution

  • The URL I'm using for my named entity recognition API request appears to be deprecated now that PubTator3 is in beta.

    If submitting a short string of text, use ...

    curl -X POST https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/RESTful/request.cgi -H "Content-Type: application/x-www-form-urlencoded" -d "text=Possible role of valvular serotonin 5-HT receptors in the cardiopathy associated with fenfluramine.&bioconcept=Gene"
    

    If submitting entire contents of a plain text file, which is probably the more relevant use case, one needs to work around the curl character limit on arguments. Best I could manage was ...

    printf "text=%s&bioconcept=Gene" "$(cat ___data/plain_txts/yang2020.txt)" | curl -X POST https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/RESTful/request.cgi -H "Content-Type: application/x-www-form-urlencoded" --data-binary @-
    

    Regrettably, the output annotation can still be truncated if the input file has too much content.