bioinformaticschemistrypymol

How to Iterate over molecules in PyMol?


I am working with Gromacs .gro files in PyMol and running into problems with multi-stranded molecules. .Gro files do not have chain identifiers, which PyMol apparently needs to calculate cartoon representations, so I am trying to find a way to add chain identifiers to each strand in my scene.

PyMol seems to have some sort of internal molecule representation because I can click on Mouse > Selection Mode > Molecule and then can manually select the molecules in my scene. I can also do select bymolecule id 1 and get the first chain, but I can't find a way to get the other chains without also knowing the relevant atom ids a priori.

So what I need is a series of PyMol commands which will iterate over all the molecules in a scene and then run alter (sele), chain='A/B/whatever' on each one so I can use the cartoon representation.

Edit:
The following script works if you have a unique residue name which appears exactly once per chain (such as RX5 in RNA molecules). This is not a generally applicable solution because it requires there to be such a unique residue.

model = cmd.get_model("resn *5 & name O3'")
chainid = ord('A')
for a in model.atom: 
    cmd.select(f"bymolecule id {a.id}")
    cmd.alter("sele", f"chain='{chr(chainid)}'")
    chainid += 1

Solution

  • Second attempt; using as input 2ms2_no_short.gro :

    Great Red Owns Many ACres of Sand 
      216
        1ALA      N    1  -0.114 -11.023   7.562
        1ALA     H1    2  -0.135 -11.077   7.482
        1ALA     H2    3  -0.021 -10.986   7.555
        1ALA     H3    4  -0.121 -11.080   7.644
        1ALA     CA    5  -0.208 -10.913   7.572
        1ALA     HA    6  -0.302 -10.948   7.581
        1ALA     CB    7  -0.183 -10.823   7.694
        1ALA    HB1    8  -0.251 -10.749   7.696
        1ALA    HB2    9  -0.191 -10.877   7.778
        1ALA    HB3   10  -0.091 -10.784   7.689
        1ALA      C   11  -0.174 -10.834   7.444
        1ALA      O   12  -0.065 -10.849   7.390
        2SER      N   13  -0.281 -10.779   7.390
        2SER      H   14  -0.368 -10.804   7.432
        2SER     CA   15  -0.289 -10.687   7.277
        2SER     HA   16  -0.223 -10.612   7.278
        2SER     CB   17  -0.266 -10.751   7.140
        2SER    HB1   18  -0.260 -10.850   7.152
        2SER    HB2   19  -0.180 -10.717   7.103
        2SER     OG   20  -0.360 -10.733   7.035
        2SER     HG   21  -0.329 -10.782   6.953
        2SER      C   22  -0.435 -10.665   7.301
        2SER      O   23  -0.507 -10.764   7.319
        3ASN      N   24  -0.484 -10.544   7.316
        3ASN      H   25  -0.427 -10.462   7.307
        3ASN     CA   26  -0.625 -10.536   7.345
        3ASN     HA   27  -0.655 -10.619   7.392
        3ASN     CB   28  -0.650 -10.418   7.436
        3ASN    HB1   29  -0.607 -10.439   7.524
        3ASN    HB2   30  -0.749 -10.410   7.448
        3ASN     CG   31  -0.599 -10.285   7.390
        3ASN    OD1   32  -0.559 -10.265   7.274
        3ASN    ND2   33  -0.595 -10.190   7.483
        3ASN   HD21   34  -0.625 -10.211   7.576
        3ASN   HD22   35  -0.562 -10.099   7.460
        3ASN      C   36  -0.696 -10.523   7.214
        3ASN      O   37  -0.817 -10.507   7.215
        4PHE      N   38  -0.630 -10.533   7.100
        4PHE      H   39  -0.531 -10.550   7.100
        4PHE     CA   40  -0.700 -10.519   6.976
        4PHE     HA   41  -0.773 -10.451   6.987
        4PHE     CB   42  -0.602 -10.471   6.873
        4PHE    HB1   43  -0.550 -10.549   6.840
        4PHE    HB2   44  -0.539 -10.406   6.917
        4PHE     CG   45  -0.668 -10.403   6.756
        4PHE    CD1   46  -0.805 -10.395   6.743
        4PHE    HD1   47  -0.864 -10.437   6.812
        4PHE    CE1   48  -0.860 -10.330   6.636
        4PHE    HE1   49  -0.959 -10.325   6.627
        4PHE     CZ   50  -0.779 -10.270   6.540
        4PHE     HZ   51  -0.819 -10.222   6.462
        4PHE    CE2   52  -0.643 -10.279   6.555
        4PHE    HE2   53  -0.584 -10.236   6.487
        4PHE    CD2   54  -0.587 -10.345   6.663
        4PHE    HD2   55  -0.488 -10.351   6.672
        4PHE      C   56  -0.761 -10.654   6.937
        4PHE      O   57  -0.707 -10.727   6.853
        5THR      N   58  -0.869 -10.692   7.002
        5THR      H   59  -0.904 -10.632   7.075
        5THR     CA   60  -0.940 -10.814   6.976
        5THR     HA   61  -0.889 -10.853   6.899
        5THR     CB   62  -0.938 -10.900   7.098
        5THR     HB   63  -1.002 -10.976   7.091
        5THR    CG2   64  -0.797 -10.951   7.118
        5THR   HG21   65  -0.794 -11.009   7.200
        5THR   HG22   66  -0.769 -11.005   7.039
        5THR   HG23   67  -0.735 -10.874   7.130
        5THR    OG1   68  -0.987 -10.824   7.209
        5THR    HG1   69  -0.986 -10.881   7.291
        5THR      C   70  -1.083 -10.797   6.932
        5THR    OC1   71  -1.132 -10.876   6.895
        5THR    OC2   72  -1.141 -10.690   6.940
        1ALA      N   73   0.988 -11.380   6.601
        1ALA     H1   74   0.924 -11.449   6.635
        1ALA     H2   75   0.951 -11.338   6.519
        1ALA     H3   76   1.076 -11.423   6.580
        1ALA     CA   77   1.007 -11.278   6.703
        1ALA     HA   78   1.046 -11.321   6.785
        1ALA     CB   79   1.106 -11.168   6.655
        1ALA    HB1   80   1.117 -11.099   6.727
        1ALA    HB2   81   1.194 -11.209   6.634
        1ALA    HB3   82   1.069 -11.124   6.572
        1ALA      C   83   0.867 -11.219   6.727
        1ALA      O   84   0.766 -11.278   6.683
        2SER      N   85   0.867 -11.115   6.809
        2SER      H   86   0.955 -11.086   6.844
        2SER     CA   87   0.750 -11.039   6.852
        2SER     HA   88   0.705 -10.987   6.780
        2SER     CB   89   0.641 -11.123   6.912
        2SER    HB1   90   0.658 -11.142   7.008
        2SER    HB2   91   0.630 -11.209   6.862
        2SER     OG   92   0.525 -11.046   6.899
        2SER     HG   93   0.448 -11.096   6.937
        2SER      C   94   0.825 -10.971   6.965
        2SER      O   95   0.869 -11.032   7.062
        3ASN      N   96   0.864 -10.850   6.935
        3ASN      H   97   0.840 -10.812   6.846
        3ASN     CA   98   0.942 -10.772   7.024
        3ASN     HA   99   1.011 -10.834   7.060
        3ASN     CB  100   1.004 -10.658   6.953
        3ASN    HB1  101   1.067 -10.613   7.017
        3ASN    HB2  102   0.932 -10.594   6.927
        3ASN     CG  103   1.079 -10.697   6.832
        3ASN    OD1  104   1.165 -10.784   6.833
        3ASN    ND2  105   1.050 -10.635   6.721
        3ASN   HD21  106   0.980 -10.564   6.720
        3ASN   HD22  107   1.099 -10.658   6.636
        3ASN      C  108   0.850 -10.710   7.126
        3ASN      O  109   0.893 -10.687   7.239
        4PHE      N  110   0.725 -10.677   7.093
        4PHE      H  111   0.684 -10.714   7.010
        4PHE     CA  112   0.651 -10.587   7.176
        4PHE     HA  113   0.718 -10.522   7.212
        4PHE     CB  114   0.546 -10.522   7.086
        4PHE    HB1  115   0.471 -10.588   7.075
        4PHE    HB2  116   0.588 -10.504   6.998
        4PHE     CG  117   0.486 -10.392   7.137
        4PHE    CD1  118   0.545 -10.317   7.236
        4PHE    HD1  119   0.629 -10.350   7.279
        4PHE    CE1  120   0.489 -10.199   7.278
        4PHE    HE1  121   0.533 -10.145   7.351
        4PHE     CZ  122   0.373 -10.153   7.220
        4PHE     HZ  123   0.332 -10.067   7.250
        4PHE    CE2  124   0.314 -10.227   7.121
        4PHE    HE2  125   0.229 -10.195   7.079
        4PHE    CD2  126   0.371 -10.345   7.080
        4PHE    HD2  127   0.327 -10.398   7.008
        4PHE      C  128   0.591 -10.647   7.301
        4PHE      O  129   0.472 -10.675   7.311
        5THR      N  130   0.676 -10.664   7.401
        5THR      H  131   0.772 -10.640   7.387
        5THR     CA  132   0.635 -10.715   7.530
        5THR     HA  133   0.535 -10.718   7.523
        5THR     CB  134   0.696 -10.850   7.555
        5THR     HB  135   0.667 -10.885   7.644
        5THR    CG2  136   0.658 -10.947   7.446
        5THR   HG21  137   0.700 -11.036   7.465
        5THR   HG22  138   0.559 -10.958   7.443
        5THR   HG23  139   0.691 -10.913   7.358
        5THR    OG1  140   0.836 -10.835   7.555
        5THR    HG1  141   0.879 -10.924   7.571
        5THR      C  142   0.675 -10.627   7.645
        5THR    OC1  143   0.637 -10.647   7.735
        5THR    OC2  144   0.750 -10.532   7.634
        1ALA      N  145  -0.495 -10.949   6.857
        1ALA     H1  146  -0.399 -10.961   6.883
        1ALA     H2  147  -0.519 -10.852   6.863
        1ALA     H3  148  -0.553 -11.002   6.918
        1ALA     CA  149  -0.514 -10.995   6.720
        1ALA     HA  150  -0.490 -11.092   6.715
        1ALA     CB  151  -0.659 -10.980   6.673
        1ALA    HB1  152  -0.667 -11.013   6.578
        1ALA    HB2  153  -0.720 -11.033   6.732
        1ALA    HB3  154  -0.686 -10.883   6.676
        1ALA      C  155  -0.425 -10.900   6.638
        1ALA      O  156  -0.416 -10.783   6.681
        2SER      N  157  -0.355 -10.946   6.535
        2SER      H  158  -0.361 -11.044   6.512
        2SER     CA  159  -0.271 -10.861   6.452
        2SER     HA  160  -0.320 -10.776   6.434
        2SER     CB  161  -0.141 -10.830   6.521
        2SER    HB1  162  -0.078 -10.907   6.511
        2SER    HB2  163  -0.157 -10.811   6.617
        2SER     OG  164  -0.080 -10.716   6.462
        2SER     HG  165   0.007 -10.697   6.509
        2SER      C  166  -0.239 -10.941   6.330
        2SER      O  167  -0.232 -11.065   6.339
        3ASN      N  168  -0.236 -10.877   6.215
        3ASN      H  169  -0.266 -10.782   6.210
        3ASN     CA  170  -0.189 -10.945   6.094
        3ASN     HA  171  -0.173 -11.041   6.115
        3ASN     CB  172  -0.292 -10.943   5.981
        3ASN    HB1  173  -0.374 -10.990   6.013
        3ASN    HB2  174  -0.253 -10.994   5.904
        3ASN     CG  175  -0.334 -10.808   5.928
        3ASN    OD1  176  -0.279 -10.706   5.964
        3ASN    ND2  177  -0.433 -10.794   5.843
        3ASN   HD21  178  -0.484 -10.874   5.811
        3ASN   HD22  179  -0.458 -10.703   5.811
        3ASN      C  180  -0.065 -10.868   6.056
        3ASN      O  181  -0.003 -10.903   5.957
        4PHE      N  182  -0.020 -10.762   6.128
        4PHE      H  183  -0.075 -10.724   6.203
        4PHE     CA  184   0.109 -10.703   6.095
        4PHE     HA  185   0.118 -10.703   5.996
        4PHE     CB  186   0.114 -10.563   6.148
        4PHE    HB1  187   0.124 -10.568   6.248
        4PHE    HB2  188   0.027 -10.518   6.126
        4PHE     CG  189   0.229 -10.480   6.091
        4PHE    CD1  190   0.275 -10.498   5.961
        4PHE    HD1  191   0.239 -10.572   5.905
        4PHE    CE1  192   0.372 -10.413   5.911
        4PHE    HE1  193   0.405 -10.426   5.818
        4PHE     CZ  194   0.424 -10.311   5.988
        4PHE     HZ  195   0.494 -10.250   5.951
        4PHE    CE2  196   0.379 -10.295   6.117
        4PHE    HE2  197   0.417 -10.221   6.174
        4PHE    CD2  198   0.282 -10.379   6.169
        4PHE    HD2  199   0.251 -10.366   6.263
        4PHE      C  200   0.217 -10.785   6.164
        4PHE      O  201   0.271 -10.749   6.267
        5THR      N  202   0.254 -10.900   6.108
        5THR      H  203   0.208 -10.931   6.025
        5THR     CA  204   0.359 -10.981   6.165
        5THR     HA  205   0.404 -10.916   6.227
        5THR     CB  206   0.304 -11.100   6.236
        5THR     HB  207   0.365 -11.180   6.238
        5THR    CG2  208   0.280 -11.070   6.383
        5THR   HG21  209   0.243 -11.151   6.428
        5THR   HG22  210   0.366 -11.044   6.426
        5THR   HG23  211   0.214 -10.995   6.391
        5THR    OG1  212   0.194 -11.145   6.158
        5THR    HG1  213   0.153 -11.225   6.201
        5THR      C  214   0.457 -11.033   6.066
        5THR    OC1  215   0.530 -11.093   6.098
        5THR    OC2  216   0.451 -11.006   5.947
       2.33514   1.38260   1.96702
    

    with code:

    
    import time
    
    import pymol
    
    from pymol import cmd
    
    pymol.finish_launching()
    # pymol.finish_launching(['pymol', '-q', '-W', '1200', '-H', '500'])
    
    time.sleep(5)
    
    def count_mols_in_sel(sel="sele"):
        """
        Returns the number of distinct molecules in a given selection.
    
        https://pymolwiki.org/index.php/Count_molecules_in_selection
        """
    
        sel_copy = "__selcopy"
    
        cmd.select(sel_copy, sel)
    
        num_objs = 0
    
        atoms_in_sel = cmd.count_atoms(sel_copy)
        
        chains = ['A','B','C']
    
        while atoms_in_sel > 0:
            
            chainid = chains[num_objs]
            
            cmd.alter(sel_copy, f"chain='{(chainid)}'")
    
            num_objs += 1
            
            # see bm. bymolecule https://pymol.org/dokuwiki/doku.php?id=selection:bymolecule
            # see first https://pymol.org/dokuwiki/doku.php?id=selection:first
            cmd.select(sel_copy, "%s and not (bm. first %s)" % (sel_copy, sel_copy))
    
            atoms_in_sel = cmd.count_atoms(sel_copy)
            
            print(atoms_in_sel)
    
        print("There are %d distinct molecules in the selection '%s'." % (num_objs, sel))
    
        return num_objs
    
    cmd.load('2ms2_no.gro' , '2ms2_no')
    print(cmd.get_names_of_type("object:molecule"))
    print(cmd.get_names("objects"))
    print(cmd.get_object_list('(all)'))
    print(cmd.get_object_list())
    print(count_mols_in_sel('all'))
    cmd.util.cbc(selection='(all)')
    cmd.save('2ms2_no_out.pdb' , '2ms2_no')
    print('should be finished by now hopefully')
    

    I can go from 2ms2_no_short.gro:

    enter image description here

    to 2ms2_no_out.pdb :

    enter image description here

    Still not sure of why it works, all my attempts to get the 3 different molecular objects (see print statements in code) as per Pymol definition:

    Molecule Concept A PyMOL Molecule is a set of atoms within a single molecular object joined by a connected graph of bonds. There are three molecules in the image below.

    Molecular Object Concept A PyMOL Molecular Object is a special type of object that can contain atoms, bonds, and coordinate sets organized by state, and that can be shown in a variety of representations.

    were in vain.

    Only way to identify the three different (chains as molecular objects) is to use adapted script from Count molecules in selection see count_mols_in_sel function in code***.

    Worth noting (not sure how GROMACS handles broken chains) that a broken chain would results in different number of chains (depending on the number of gaps).

    Givi it a try, and let me know

    *** think same process is found in FilterByMol