pythonweb-servicesbioservices

How to retrieve all ChEBI IDs for a given KEGG compound?


Let's say I want to map a KEGG ID to a ChEBI ID using bioservices, I can do:

from bioservices import *

kegg_con = KEGG()
kegg_entry = kegg_con.parse(kegg_con.get('C00033'))
print(kegg_entry['DBLINKS']['ChEBI'].split())

This will return

[u'15366', u'30089']

meaning that there are two ChEBI IDs associated with the KEGG compound (KEGG entry C00033).

An alternative - if one has to do a lot of mappings - is to use the built-in converter like this:

map_kegg_chebi = kegg_con.conv("chebi", "compound")
print(map_kegg_chebi['cpd:C00033'])

This will print

u'chebi:15366'

So for the same compound, only one ID is returned although there are two assicated with this compound. Is there a way to retrieve both of them?


Solution

  • In short, I do not have the answer but here are some information that may help you.

    The C00033 entry shows two related entities in the CHEBI database: 15366 and 30089. Now, if we look at CHEBI website, we can see that those two entries correspond to :

    acetate is the ion resulting from loss of H+ from the acetic acid.

    why KEGG decided to provide the two entries: I do not know.

    Using the kegg_con.conv , we can see that C00033 maps to only one result in chebi, which seems sensible to me (but confusing I agree) since 30089 is just the ion form of acetic acid (15366).

    For completeness, note that using the ChEBI service (from bioservices), we can map back the two ChEBI entries to KEGG and we see that:

    from bioservices import ChEBI
    chebi = ChEBI()
    chebi.conv("CHEBI:30089", "KEGG COMPOUND accession")
    chebi.conv("CHEBI:15366", "KEGG COMPOUND accession")
    

    returns C00033 in both cases ! However, on this example, I would say that you do not lose much information by ignoring the ion form of the acetic acid.

    Would be interesting to systematically check that other ambiguous mapping fall in this category as well (ion forms)