I have a genome-scale stoichiometric metabolic model iMM904.xml
and when I open it in a text editor I can see that certain genes have annotation added to them, e.g.
<fbc:geneProduct fbc:id="G_YLR189C" fbc:label="YLR189C" metaid="G_YLR189C">
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
<rdf:Description rdf:about="#G_YLR189C">
<bqbiol:isEncodedBy>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/ncbigene/850886" />
<rdf:li rdf:resource="http://identifiers.org/sgd/S000004179" />
</rdf:Bag>
</bqbiol:isEncodedBy>
</rdf:Description>
</rdf:RDF>
</annotation>
</fbc:geneProduct>
How can I access and alter this annotation? When I try
import cbmpy as cbm
cmod = cbm.CBRead.readSBML3FBC('iMM904.xml')
gene = cmod.getGene('G_YLR189C')
print gene.getAnnotations()
I only see an empty dictionary.
In addition, how could I add annotations like last modified by
and actual notes to it?
In CBMPy, you have three different options of adding annotation to a SBML file:
1) MIRIAM annotation,
2) arbitrary key value pairs and
3) human-readable notes
which should cover all points you have mentioned in your question. I demonstrate how to use them for the gene entry, but the same commands can be used to annotate species (metabolites) and reactions.
1. MIRIAM annotation
To access the existing MIRIAM annotation - the one you show in your question - you can use:
import cbmpy as cbm
mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')
# access gene directly by its locus tag which avoids dealing with the "G_" in the ID
gene = mod.getGeneByLabel('YLR189C')
gene.getMIRIAMannotations()
This will give:
{'encodes': (),
'hasPart': (),
'hasProperty': (),
'hasTaxon': (),
'hasVersion': (),
'is': (),
'isDerivedFrom': (),
'isDescribedBy': (),
'isEncodedBy': ('http://identifiers.org/ncbigene/850886',
'http://identifiers.org/sgd/S000004179'),
'isHomologTo': (),
'isPartOf': (),
'isPropertyOf': (),
'isVersionOf': (),
'occursIn': ()}
As you can see, it contains the entries you saw in the SBML file.
If you now want to add MIRIAM annotation, you can use two approaches:
A) let CBMPy create the url for you:
gene.addMIRIAMannotation('is', 'UniProt Knowledgebase', 'Q06321')
B) enter the url your self:
# made up protein!
gene.addMIRIAMuri('is', 'http://identifiers.org/uniprot/P12345')
If you now check gene.getMIRIAMannotations()
, you will see (I cut off a few empty entries):
'is': ('http://identifiers.org/uniprot/Q06321',
'http://identifiers.org/uniprot/P12345'),
'isDerivedFrom': (),
'isDescribedBy': (),
'isEncodedBy': ('http://identifiers.org/ncbigene/850886',
'http://identifiers.org/sgd/S000004179'),
So, both of your entries have been added (again: the P12345
entry is just for demonstration, don't use it in your actual model!).
If you do not know the correct database identifier, CBMPy will also help you there, e.g. if you try:
gene.addMIRIAMannotation('is', 'uniprot', 'Q06321')
it will print
"uniprot" is not a valid entity were you looking for one of these:
UNII
UniGene
UniParc
UniPathway Compound
UniPathway Reaction
UniProt Isoform
UniProt Knowledgebase
UniSTS
Unimod
Unipathway
Unit Ontology
Unite
INFO: Invalid entity: "uniprot" MIRIAM entity NOT set
which contains 'UniProt Knowledgebase'
which we used above.
2. Adding arbitrary key value pairs.
Not everything can be annotated using the MIRIAM annotation scheme but you can easily create your own key-value-pairs
. Using your example,
gene.setAnnotation('last_modified_by', 'Vinz')
The keys and values are fully arbitrary,
gene.setAnnotation('arbitrary key', 'arbitrary value')
If you now call
gene.getAnnotations()
you receive
{'arbitrary key': 'arbitrary value', 'last_modified_by': 'Vinz'}
If you want to access a certain key, you can use
gene.getAnnotation('last_modified_by')
which yields
'Vinz'
3. Adding notes
If you want to write actual comments neither of the first two options are appropriate but you can use:
gene.setNotes('This is my favorite gene')
You can access them using
gene.getNotes()
If you now export the model using (make sure to use FBCV2!):
cbm.CBWrite.writeSBML3FBCV2(mod, 'iMM904_edited.xml')
and open the model in your text editor, you will see that all the annotation has been added in:
<fbc:geneProduct metaid="meta_G_YLR189C" fbc:id="G_YLR189C" fbc:label="YLR189C">
<notes>
<html:body>This is my favorite gene</html:body>
</notes>
<annotation>
<listOfKeyValueData xmlns="http://pysces.sourceforge.net/KeyValueData">
<data id="arbitrary key" value="arbitrary value"/>
<data id="last_modified_by" value="Vinz"/>
</listOfKeyValueData>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
<rdf:Description rdf:about="#meta_G_YLR189C">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/uniprot/Q06321"/>
<rdf:li rdf:resource="http://identifiers.org/uniprot/P12345"/>
</rdf:Bag>
</bqbiol:is>
<bqbiol:isEncodedBy>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/ncbigene/850886"/>
<rdf:li rdf:resource="http://identifiers.org/sgd/S000004179"/>
</rdf:Bag>
</bqbiol:isEncodedBy>
</rdf:Description>
</rdf:RDF>
</annotation>
</fbc:geneProduct>