pythoncbmpysbml

How to create a genome scale metabolic model in cbmpy from scratch?


So the previous question was apparently not wisely asked. So here is the rephrased original question:

I am planing on building up a new GSMM from nothong (i.e. I need a empty model, 0 reaction/metabolite/compartment/GPR/annotation, but with a proper model structure that toolbox will accept).

I used to work with Matlab and COBRA toolbox but recently changed into Python and cbmpy toolbox. I know how to create a model structure in Matlab through COBRA but not in python with cbmpy. I am really not familiar with cmbpy's functions and I am here ask for help.

There are ways to meet my needs:

  1. The most stupid way: read an exist model and remove all its contents.

    import cbmpy as cbm
    mod = cbm.CBRead.readSBML3FBC('example.xml')
    for ri in mod.getSpeciesIds():
        mod.deleteSpecies(ri, also_delete = 'reaction')

Similarly, different functions start with 'mod.delete....' will clean all the stuff. Eventually you will have a clean empty model.

  1. As suggested by Cleb, I looked into the source code and found the function I need

    import cbmpy as cbm
    mod = cbm.CBModel.Model('Name')
    mod.createSpecies('M_a_c',boundary=False,value=float('nan'),name='AA',compartment='c',charge=None,chemFormula='C2H2O2')
    mod.createSpecies('M_b_c',boundary=False,value=float('nan'),name='BB',compartment='c',charge=None,chemFormula='CHO')
    mod.createReaction('R_AB',reversible=False)
    mod.createReactionReagent('R_AB','M_a_c',-1.0)
    mod.createReactionReagent('R_AB','M_b_c',2)

Then a model object with ID 'Name' is created. It is a very simple question and its answer very straightforward, if you have found the right function. Later you may gradually add reaction/metabolites/etc and assign the objective reaction. It is of course also possible to directly use cbm.CBModelTools.quickDefaultBuild but you need to prepare necessary arguments for adding reactions/species/boundary/objective.


Solution

  • While you can indeed build a model as you described, it might be easier (and cleaner) to use

    cbm.CBModelTools.quickDefaultBuild
    

    I will show how to use it for a minimal example which is quite self-explanatory:

    import cbmpy as cbm
    
    
    def define_new_model():
    
        model_name = 'my_great_model'
    
        reactions = {
            'R01': {'id': 'R01', 'reversible': True, 'reagents': [(-1., 'A'), (1., 'A_ext')], 'SUBSYSTEM': ''},
            'R02': {'id': 'R02', 'reversible': True, 'reagents': [(-1., 'B'), (1., 'B_ext')], 'SUBSYSTEM': '',
                    'GENE_ASSOCIATION': '(gene_1 or gene_2)'},
            'R_EX_A': {'id': 'R_EX_A', 'reversible': True, 'reagents': [(1., 'A_b'), (-1., 'A_ext')], 'SUBSYSTEM': ''},
            'R_EX_B': {'id': 'R_EX_B', 'reversible': True, 'reagents': [(1., 'B_b'), (-1., 'B_ext')], 'SUBSYSTEM': ''},
            'R_biomass': {'id': 'R_biomass', 'reversible': False, 'reagents': [(-1., 'A'), (-1., 'B')], 'SUBSYSTEM': ''}
        }
    
        species = {
            'A': {'id': 'A', 'boundary': False, 'SUBSYSTEM': 'C1'},
            'B': {'id': 'B', 'boundary': False, 'SUBSYSTEM': 'C1'},
            'A_ext': {'id': 'A_ext', 'boundary': False, 'SUBSYSTEM': 'C0'},
            'B_ext': {'id': 'B_ext', 'boundary': False, 'SUBSYSTEM': 'C0'},
            'A_b': {'id': 'A_b', 'boundary': True, 'SUBSYSTEM': ''},
            'B_b': {'id': 'B_b', 'boundary': True, 'SUBSYSTEM': ''},
        }
    
        bounds = {'R_EX_A': {'lower': -1., 'upper': 0.},
                  'R_EX_B': {'lower': -1., 'upper': 0.}}
    
        objective_function = {'objMaxJ1': {'id': 'biomass_obj1',
                                           'flux': 'R_biomass',
                                           'coefficient': 1,
                                           'sense': 'Maximize',
                                           'active': True}}
    
        return model_name, reactions, species, bounds, objective_function
    
    
    model_def = define_new_model()
    
    mod = cbm.CBModelTools.quickDefaultBuild(*model_def)
    

    Now you can call

    mod.getReactionIds()
    ['R01', 'R02', 'R_EX_A', 'R_EX_B', 'R_biomass']
    

    and

    mod.getSpeciesIds()
    ['A', 'A_b', 'A_ext', 'B', 'B_b', 'B_ext']
    

    and

    mod.getBoundarySpeciesIds()
    ['A_b', 'B_b']
    

    and you can also simulate the model

    cbm.doFBA(mod)
    analyzeModel objective value: 1.0
    

    When you look for the genes you defined for reaction R02, you will see an empty list:

    mod.getGeneIds()
    []
    

    You will first have to call

    mod.createGeneAssociationsFromAnnotations()
    

    which prints

    INFO: used key(s) '['GENE_ASSOCIATION']'
    INFO: Added 2 new genes and 1 associations to model
    

    And then

    mod.getGeneIds()
    

    will give the desired outcome

    ['gene_1', 'gene_2']
    

    as well as

    mod.getGPRforReaction('R02').getAssociationStr()
    

    which returns

    '((gene_1 or gene_2))'
    

    I highly recommend to use this approach as this is easier to read and edit later on if you want to change the model definition.

    If you want to add further annotation, you can check this question.