pythoncbmpysbml

How to add annotation to a gene in SBML?


I have a genome-scale stoichiometric metabolic model iMM904.xml and when I open it in a text editor I can see that certain genes have annotation added to them, e.g.

<fbc:geneProduct fbc:id="G_YLR189C" fbc:label="YLR189C" metaid="G_YLR189C">
<annotation>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
    <rdf:Description rdf:about="#G_YLR189C">
      <bqbiol:isEncodedBy>
        <rdf:Bag>
          <rdf:li rdf:resource="http://identifiers.org/ncbigene/850886" />
          <rdf:li rdf:resource="http://identifiers.org/sgd/S000004179" />
        </rdf:Bag>
      </bqbiol:isEncodedBy>
    </rdf:Description>
  </rdf:RDF>
</annotation>
</fbc:geneProduct>

How can I access and alter this annotation? When I try

import cbmpy as cbm

cmod = cbm.CBRead.readSBML3FBC('iMM904.xml')

gene = cmod.getGene('G_YLR189C')

print gene.getAnnotations()

I only see an empty dictionary.

In addition, how could I add annotations like last modified by and actual notes to it?


Solution

  • In CBMPy, you have three different options of adding annotation to a SBML file:

    1) MIRIAM annotation,

    2) arbitrary key value pairs and

    3) human-readable notes

    which should cover all points you have mentioned in your question. I demonstrate how to use them for the gene entry, but the same commands can be used to annotate species (metabolites) and reactions.

    1. MIRIAM annotation

    To access the existing MIRIAM annotation - the one you show in your question - you can use:

    import cbmpy as cbm
    
    mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')
    
    # access gene directly by its locus tag which avoids dealing with the "G_" in the ID
    gene = mod.getGeneByLabel('YLR189C')
    
    gene.getMIRIAMannotations()
    

    This will give:

    {'encodes': (),
     'hasPart': (),
     'hasProperty': (),
     'hasTaxon': (),
     'hasVersion': (),
     'is': (),
     'isDerivedFrom': (),
     'isDescribedBy': (),
     'isEncodedBy': ('http://identifiers.org/ncbigene/850886',
      'http://identifiers.org/sgd/S000004179'),
     'isHomologTo': (),
     'isPartOf': (),
     'isPropertyOf': (),
     'isVersionOf': (),
     'occursIn': ()}
    

    As you can see, it contains the entries you saw in the SBML file.

    If you now want to add MIRIAM annotation, you can use two approaches:

    A) let CBMPy create the url for you:

    gene.addMIRIAMannotation('is', 'UniProt Knowledgebase', 'Q06321')
    

    B) enter the url your self:

    # made up protein!
    gene.addMIRIAMuri('is', 'http://identifiers.org/uniprot/P12345')
    

    If you now check gene.getMIRIAMannotations(), you will see (I cut off a few empty entries):

    'is': ('http://identifiers.org/uniprot/Q06321',
      'http://identifiers.org/uniprot/P12345'),
     'isDerivedFrom': (),
     'isDescribedBy': (),
     'isEncodedBy': ('http://identifiers.org/ncbigene/850886',
      'http://identifiers.org/sgd/S000004179'),
    

    So, both of your entries have been added (again: the P12345 entry is just for demonstration, don't use it in your actual model!).

    If you do not know the correct database identifier, CBMPy will also help you there, e.g. if you try:

    gene.addMIRIAMannotation('is', 'uniprot', 'Q06321')
    

    it will print

    "uniprot" is not a valid entity were you looking for one of these:
    
        UNII
        UniGene
        UniParc
        UniPathway Compound
        UniPathway Reaction
        UniProt Isoform
        UniProt Knowledgebase
        UniSTS
        Unimod
        Unipathway
        Unit Ontology
        Unite
    INFO: Invalid entity: "uniprot" MIRIAM entity NOT set
    

    which contains 'UniProt Knowledgebase' which we used above.

    2. Adding arbitrary key value pairs.

    Not everything can be annotated using the MIRIAM annotation scheme but you can easily create your own key-value-pairs. Using your example,

    gene.setAnnotation('last_modified_by', 'Vinz')
    

    The keys and values are fully arbitrary,

    gene.setAnnotation('arbitrary key', 'arbitrary value')
    

    If you now call

    gene.getAnnotations()
    

    you receive

    {'arbitrary key': 'arbitrary value', 'last_modified_by': 'Vinz'}
    

    If you want to access a certain key, you can use

    gene.getAnnotation('last_modified_by')
    

    which yields

    'Vinz'
    

    3. Adding notes

    If you want to write actual comments neither of the first two options are appropriate but you can use:

    gene.setNotes('This is my favorite gene')
    

    You can access them using

    gene.getNotes()
    

    If you now export the model using (make sure to use FBCV2!):

    cbm.CBWrite.writeSBML3FBCV2(mod, 'iMM904_edited.xml')
    

    and open the model in your text editor, you will see that all the annotation has been added in:

    <fbc:geneProduct metaid="meta_G_YLR189C" fbc:id="G_YLR189C" fbc:label="YLR189C">
      <notes>
        <html:body>This is my favorite gene</html:body>
      </notes>
      <annotation>
        <listOfKeyValueData xmlns="http://pysces.sourceforge.net/KeyValueData">
          <data id="arbitrary key" value="arbitrary value"/>
          <data id="last_modified_by" value="Vinz"/>
        </listOfKeyValueData>
        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
          <rdf:Description rdf:about="#meta_G_YLR189C">
            <bqbiol:is>
              <rdf:Bag>
                <rdf:li rdf:resource="http://identifiers.org/uniprot/Q06321"/>
                <rdf:li rdf:resource="http://identifiers.org/uniprot/P12345"/>
              </rdf:Bag>
            </bqbiol:is>
            <bqbiol:isEncodedBy>
              <rdf:Bag>
                <rdf:li rdf:resource="http://identifiers.org/ncbigene/850886"/>
                <rdf:li rdf:resource="http://identifiers.org/sgd/S000004179"/>
              </rdf:Bag>
            </bqbiol:isEncodedBy>
          </rdf:Description>
        </rdf:RDF>
      </annotation>
    </fbc:geneProduct>