pythonapiarticlescopushttpx

How to Use Elsevier Article Retrieval API to get fulltext of paper


I want to use Elsevier Article Retrieval API (https://dev.elsevier.com/documentation/FullTextRetrievalAPI.wadl) to get fulltext of paper.

I use httpx to get the information of the paper,but it just contains some information.My code is below:

import httpx
import time


def scopus_paper_date(paper_doi,apikey):
    apikey=apikey
    headers={
        "X-ELS-APIKey":apikey,
        "Accept":'text/xml'
         }

    timeout = httpx.Timeout(10.0, connect=60.0)
    client = httpx.Client(timeout=timeout,headers=headers)
    query="&view=FULL"
    url=f"https://api.elsevier.com/content/article/doi/" + paper_doi
    r=client.get(url)
    print(r)
    return r.text

y = scopus_paper_date('10.1016/j.solmat.2021.111326',myapikey)
y

the result is below:

<full-text-retrieval-response xmlns="http://www.elsevier.com/xml/svapi/article/dtd" xmlns:bk="http://www.elsevier.com/xml/bk/dtd" xmlns:cals="http://www.elsevier.com/xml/common/cals/dtd" xmlns:ce="http://www.elsevier.com/xml/common/dtd" xmlns:ja="http://www.elsevier.com/xml/ja/dtd" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:sa="http://www.elsevier.com/xml/common/struct-aff/dtd" xmlns:sb="http://www.elsevier.com/xml/common/struct-bib/dtd" xmlns:tb="http://www.elsevier.com/xml/common/table/dtd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:prism="http://prismstandard.org/namespaces/basic/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><coredata><prism:url>https://api.elsevier.com/content/article/pii/S0927024821003688</prism:url>....

how can i get the fulldata of the paper,many thanks!


Solution

  • That depends on the paper you want to download.

    I modified a bit the function you posted. Now it gets the response as JSON and no XML (this is just my personal preference, you can use the format you prefer).

    import httpx
    import time
    
    def scopus_paper_date(paper_doi,apikey):
        apikey=apikey
        headers={
            "X-ELS-APIKey":apikey,
            "Accept":'application/json'
             }
        timeout = httpx.Timeout(10.0, connect=60.0)
        client = httpx.Client(timeout=timeout,headers=headers)
        query="&view=FULL"
        url=f"https://api.elsevier.com/content/article/doi/"+paper_doi
        r=client.get(url)
        print(r)
        return r
    

    Now you can retrieve the document you want, and then you will have to parse it:

    # Get document
    y = scopus_paper_date('10.1016/j.solmat.2021.111326',my_api_key)
    # Parse document
    import json
    json_acceptable_string = y.text
    d = json.loads(json_acceptable_string)
    # Print document
    print(d['full-text-retrieval-response']['coredata']['dc:description'])
    

    The result will the the dc:description of the document, i.e. the Abstract:

    The production of molecular hydrogen by photoelectrochemical dissociation (PEC) of water is a promising technique, which allows ... The width of the forbidden bands and the position of the valence and conduction bands of the different materials were determined by Mott - Schottky type measurements.

    For this document that is all that you can get, there are no more options. However, if you try to get a different document, for example:

    # Get document
    y = scopus_paper_date('10.1016/j.nicl.2021.102600',my_api_key)
    # Parse document
    import json
    json_acceptable_string = y.text
    d = json.loads(json_acceptable_string)
    

    You can then print the originalText key of the full-text-retrieval-response

    # Print document
    print(d['full-text-retrieval-response']['originalText'])
    

    You will notice that this is a very long string containing a lot of text, probably more that you want, for example it contains all the references as well.

    As I said in the beginning, the information you can get depends on the single paper. However, the full data will always be contained in the y variable defined in the code.