pythonxmlncbi

Parsing XML object in python 3.9


I'm trying to get some data using the NCBI API. I am using requests to make the connection to the API.

What I'm stuck on is how do I convert the XML object that requests returns into something that I can parse?

Here's my code for the function so far:

def getNCBIid(speciesName):
    import requests
    
    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
    
    url = base_url + "esearch.fcgi?db=assembly&term=(%s[All Fields])&usehistory=y&api_key=f1e800ad255b055a691c7cf57a576fe4da08" % speciesName
    
    #xml object
    api_request = requests.get(url)

Solution

  • You would use something like BeautifulSoup for this ('this' being 'convert and parse the xml object').

    What you are calling your xml object is still the response object, and you need to extract the content from that object first.

    from bs4 import BeautifulSoup
    
    def getNCBIid(speciesName):
        import requests
        
        base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
        
        url = base_url + "esearch.fcgi?db=assembly&term=(%s[All Fields])&usehistory=y&api_key=f1e800ad255b055a691c7cf57a576fe4da08" % speciesName
        
        #xml object. <--- this is still just your response object
        api_request = requests.get(url)
         
        # grab the response content 
        xml_content = api_request.content
        
        # parse with beautiful soup        
        soup = BeautifulSoup(xml_content, 'xml')
    
        # from here you would access desired elements 
        # here are docs: https://www.crummy.com/software/BeautifulSoup/bs4/doc/