pythonbioinformaticsbiopython

BioPython: KEGG REST keeps reporting HTTP Error 403: Forbidden


I'm attempting to use BioPython's REST module from Bio.KEGG to query the KEGG database to get the names and formulas of some compounds, using the compounds chemical identification number (CID), e.g. C0001 is water, C00123 is leucine, etc:

from Bio.KEGG import REST
from Bio.KEGG import Compound


def cpd_decoder(cid): #gets the compound name and formula from KEGG
    if "C" in cid:
        cid="cpd:"+cid
        kegg_entry=REST.kegg_get(cid)
        for record in Compound.parse(kegg_entry):
            cid_name=record.name[0]
            cid_formula=record.formula 
            return cid_name,cid_formula

cid="C00123" #example CID; this one's for leucine
if cpd_decoder(cid) !=None:
    compound,formula=cpd_decoder(cid)

However, despite the fact that BioPython is using KEGG's own API, I almost always get the following error:

    if cpd_decoder(cid) !=None:
  File "/media/tessa/Storage/Alien_Earths/Network_expansion/network expansion test 2.py", line 27, in cpd_decoder
    kegg_entry=REST.kegg_get(cid)
  File "/home/tessa/.local/lib/python3.10/site-packages/Bio/KEGG/REST.py", line 208, in kegg_get
    resp = _q("get", dbentries)
  File "/home/tessa/.local/lib/python3.10/site-packages/Bio/KEGG/REST.py", line 44, in _q
    resp = urlopen(URL % (args))
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden 

I'm wondering if because I'm working with a large list of CIDs, KEGG now thinks I'm a bot and is blocking me. Is there a way to work around this or resolve the issue?


Solution

  • I found the fix, and it sped up the code a lot. I have not used Biopython before, but rather the requests package in python. Probably you can do the same in Biopython.

    Rather than making a connection request for every KO number (or in your case compound ID) separately, you can put all KO numbers in a single request. So instead of requesting:

    https://rest.kegg.jp/link/reaction/ko:K00012

    https://rest.kegg.jp/link/reaction/ko:K12450

    etc..

    you can do this:

    https://rest.kegg.jp/link/reaction/ko:K00012+K12450+<insert all your queries separated by a "+">

    This also runs much faster because you just need to wait for KEGG to respond a single time. Then you just need to parse the result (probably Biopython can do that already)

    Here's what my code for this looks like:

    import requests
    
    #Replace by your own query
    KO_numbers = ["K00012", "K12450", "K21379"]
    
    #Define the start of the URL, replace with the URL for your own need
    url = "https://rest.kegg.jp/link/reaction/ko:"
    
    #For each KO number in the list: add it to the URL, and put a "+" in between
    for KO in KO_numbers:
        url += KO
        url +=  "+"
    
    #Do the actual request, raise an error if something is wrong
    response = requests.get(url)
    if response.status_code != 200:
         raise ConnectionError("Cannot connect to KEGG API") 
    
    #Here I just print the response, but from here you need to parse it to do what you want to do with the data
    print(response.text)