pythonchemistrypymol

How to extract a cartesian file (.xyz format) from a Pymol molecule?


This is a very basic question, but I can't find anything helpful on the internet: I created a molecule on Pymol; is there a way to extract its cartesian coordinate file in xyz-format?

Thank you very much in advance!

XYZ file format from Wikipedia

<number of atoms>
comment line
<element> <X> <Y> <Z>
...

Solution

  • EDITED SORRY JUST REALIZED YOU WERE ASKING ABOUT .xyz format NOT JUST COORDINATES

    People suggest using openbabel ( github openbabel ) see here :

    Converting a PDB file to XYZ file

    From PyMOL old documentation https://pymol.org/dokuwiki/doku.php?id=start seems that it does support loading .xyz format but not saving to it.

    Latest PyMOL version shows .xyz format too for export

    just select XYZ (*.xyz) as File type when saving exported

    molecule , result using file test.pdb below :

    27        
    test
    N 32.504002 37.205002 19.346001
    C 33.806000 36.685001 19.868999
    C 35.000000 37.216000 19.097000
    O 34.831001 37.956001 18.112000
    C 33.779999 35.136002 19.858000
    C 33.676998 34.608002 18.436001
    O 34.995998 34.712002 20.476000
    N 36.222000 36.900002 19.493000
    C 37.466000 37.312000 18.870001
    C 37.889999 36.323002 17.771999
    O 37.108002 35.425999 17.476999
    C 38.573002 37.472000 19.903000
    C 38.854000 36.188000 20.687000
    N 39.763000 36.310001 21.648001
    O 38.219002 35.167999 20.445000
    N 39.077999 36.497002 17.205000
    C 39.536999 35.640999 16.112000
    C 39.734001 34.175999 16.518000
    O 39.174000 33.270000 15.869000
    C 40.806999 36.208000 15.498000
    N 40.123001 33.959999 17.768000
    C 40.236000 32.608002 18.357000
    C 38.868000 31.997000 18.531000
    O 38.570999 30.833000 18.173000
    C 40.984001 32.761002 19.690001
    C 41.111000 31.396000 20.389999
    O 42.296001 33.273998 19.424999
    

    PyMOL API would be :

    cmd.save('selection' , 'output.xyz') , but it doesn't work.

    output

    Traceback (most recent call last):
      File "./xxx.py", line 79, in <module>
        cmd.save('test' , 'output.xyz')
      File "/.../.../.../pymol/exporting.py", line 876, in save
        raise pymol.CmdException('File format not supported for export')
    pymol.CmdException:  Error: File format not supported for export
    
    

    from PyMOL Wiki! usin PyMOL API:

    try : xyz = cmd.get_coords('sele', 1) as explained in

    Get Coordinates I

    or from PyMOL console as per in :

    Thread: [PyMOL] How do you show the coordinates of an atom? :

    try : iterate_state 1, <selection_name>, print (x,y,z)

    see Iterate and the iterate-family exposed the variables.

    More in details using PyMOL console ,

    loading file test.pdb in PyMOL :

    ATOM      1  N   THR A   1      32.504  37.205  19.346  1.00 35.93           N  
    ATOM      2  CA  THR A   1      33.806  36.685  19.869  1.00 33.83           C  
    ATOM      3  C   THR A   1      35.000  37.216  19.097  1.00 34.05           C  
    ATOM      4  O   THR A   1      34.831  37.956  18.112  1.00 37.45           O  
    ATOM      5  CB  THR A   1      33.780  35.136  19.858  1.00 35.93           C  
    ATOM      6  OG1 THR A   1      34.996  34.712  20.476  1.00 29.78           O  
    ATOM      7  CG2 THR A   1      33.677  34.608  18.436  1.00 28.68           C  
    ATOM      8  N   ASN A   2      36.222  36.900  19.493  1.00 27.70           N  
    ATOM      9  CA  ASN A   2      37.466  37.312  18.870  1.00 26.38           C  
    ATOM     10  C   ASN A   2      37.890  36.323  17.772  1.00 23.63           C  
    ATOM     11  O   ASN A   2      37.108  35.426  17.477  1.00 26.11           O  
    ATOM     12  CB  ASN A   2      38.573  37.472  19.903  1.00 26.18           C  
    ATOM     13  CG  ASN A   2      38.854  36.188  20.687  1.00 23.57           C  
    ATOM     14  OD1 ASN A   2      38.219  35.168  20.445  1.00 24.27           O  
    ATOM     15  ND2 ASN A   2      39.763  36.310  21.648  1.00 25.69           N  
    ATOM     16  N   ALA A   3      39.078  36.497  17.205  1.00 27.65           N  
    ATOM     17  CA  ALA A   3      39.537  35.641  16.112  1.00 27.70           C  
    ATOM     18  C   ALA A   3      39.734  34.176  16.518  1.00 27.15           C  
    ATOM     19  O   ALA A   3      39.174  33.270  15.869  1.00 26.97           O  
    ATOM     20  CB  ALA A   3      40.807  36.208  15.498  1.00 27.00           C  
    ATOM     21  N   THR A   4      40.123  33.960  17.768  1.00 24.83           N  
    ATOM     22  CA  THR A   4      40.236  32.608  18.357  1.00 21.65           C  
    ATOM     23  C   THR A   4      38.868  31.997  18.531  1.00 22.46           C  
    ATOM     24  O   THR A   4      38.571  30.833  18.173  1.00 22.18           O  
    ATOM     25  CB  THR A   4      40.984  32.761  19.690  1.00 19.96           C  
    ATOM     26  OG1 THR A   4      42.296  33.274  19.425  1.00 28.59           O  
    ATOM     27  CG2 THR A   4      41.111  31.396  20.390  1.00 23.74           C  
    
    
    

    using :

    select pippo ,resi 1 AND name CA ---> Selector: selection "pippo" defined with 1 atoms.

    Then :

    iterate_state 1, pippo , print ( x,y,z) ---> 33.805999755859375 36.685001373291016 19.868999481201172 IterateState: iterated over 1 atom coordinate states.

    While using PyMOL API on same file test.pdb with code :

    import pymol
    
    from pymol import (
                        cmd ,
                        stored,
                        )
    
    
    print('########## PYMOL VERSION ##########################################')
    print('         ',  cmd.get_version() )
    print('###################################################################')
    
    
    
    # pymol.finish_launching()  # uncomment to launch PyMOL
    
    
    cmd.load('test.pdb' , 'test')
    
    cmd.select('pippo' , 'resi 1 AND name CA')
    
    cmd.color('yellow' , 'pippo' , 0)
    
    
    xyz = cmd.get_coords('pippo', 1)
    
    
    print('\nxyz : ', xyz , '\n') # ----> xyz :  [[33.806 36.685 19.869]] 
    
    
    
    xyz = cmd.get_coordset('pippo', 1) # Operates on the object-state level, not on selections
    
    
    print('\nxyz : ', xyz , '\n') # ---> xyz :  None 
    
    
    xyz = cmd.get_coordset('test', 1) # Operates on the object-state level, not on selections
    
    
    print('\nxyz : ', xyz , '\n') # ---> xyz :   [[32.504 37.205 19.346] .........  [41.111 31.396 20.39 ]] 
    
    
    xyz = cmd.get_model('pippo', 1).get_coord_list()
    
    
    print('\nxyz : ', xyz , '\n')  # -----> xyz :  [[33.805999755859375, 36.685001373291016, 19.868999481201172]] 
    
    
    
    
    ### OR USING ITERATE :
        
        
    stored.coords = []
    
    cmd.iterate_state("-1" ,  "pippo"  , "stored.coords.append([x,y,z])")
                
    print('\nstored.coords : ' , stored.coords ,'\n')
    
    
    #### PRINTS --->  stored.coords :  [[33.805999755859375, 36.685001373291016, 19.868999481201172]] 
    

    Output :

    PyMOL not running, entering library mode (experimental)
     Executive: Colored 1 atom.
    
    xyz :  [[33.806 36.685 19.869]] 
    
    
    xyz :  None 
    
    
    xyz :  [[32.504 37.205 19.346]
     [33.806 36.685 19.869]
     [35.    37.216 19.097]
     [34.831 37.956 18.112]
     [33.78  35.136 19.858]
     [34.996 34.712 20.476]
     [33.677 34.608 18.436]
     [36.222 36.9   19.493]
     [37.466 37.312 18.87 ]
     [37.89  36.323 17.772]
     [37.108 35.426 17.477]
     [38.573 37.472 19.903]
     [38.854 36.188 20.687]
     [38.219 35.168 20.445]
     [39.763 36.31  21.648]
     [39.078 36.497 17.205]
     [39.537 35.641 16.112]
     [39.734 34.176 16.518]
     [39.174 33.27  15.869]
     [40.807 36.208 15.498]
     [40.123 33.96  17.768]
     [40.236 32.608 18.357]
     [38.868 31.997 18.531]
     [38.571 30.833 18.173]
     [40.984 32.761 19.69 ]
     [42.296 33.274 19.425]
     [41.111 31.396 20.39 ]] 
    
    
    xyz :  [[33.805999755859375, 36.685001373291016, 19.868999481201172]] 
    
    
    stored.coords :  [[33.805999755859375, 36.685001373291016, 19.868999481201172]] 
    
    

    adding this bit of code to the previous one :

    
    stored.xyz_full= []
    
    cmd.iterate_state("-1" ,  "test"  , "stored.xyz_full.append([elem , f'{x:.15f}' , f'{y:.15f}' , f'{z:.15f}'] )")
                
    print('\nstored.xyz_full : ' , stored.xyz_full ,'\n')
    
    print('len(stored.xyz_full) : ', len(stored.xyz_full ))
    
    
    with open('output_xyz.xyz' , 'w') as handle :
        
        handle.write(str(len(stored.xyz_full ))+'\n')
        
        handle.write('test'+'\n')
        
        for i in stored.xyz_full :
            
            handle.write(str(i[0]) +' ' + str(i[1]) +' ' +str(i[2]) +' ' + str(i[3])+'\n')
        
        
    ##SAVE molecule test in xyz :
        
    # cmd.save('test' , 'output.xyz') # !!!!!!!!!!!!!!!!! pymol.CmdException:  Error: File format not supported for export
    
    

    ouput will have added :

    stored.xyz_full :  [['N', '32.504001617431641', '37.205001831054688', '19.346000671386719'], ['C', '33.805999755859375', '36.685001373291016', '19.868999481201172'], ['C', '35.000000000000000', '37.215999603271484', '19.097000122070312'], ['O', '34.831001281738281', '37.956001281738281', '18.111999511718750'], ['C', '33.779998779296875', '35.136001586914062', '19.857999801635742'], ['C', '33.676998138427734', '34.608001708984375', '18.436000823974609'], ['O', '34.995998382568359', '34.712001800537109', '20.475999832153320'], ['N', '36.222000122070312', '36.900001525878906', '19.493000030517578'], ['C', '37.465999603271484', '37.312000274658203', '18.870000839233398'], ['C', '37.889999389648438', '36.323001861572266', '17.771999359130859'], ['O', '37.108001708984375', '35.425998687744141', '17.476999282836914'], ['C', '38.573001861572266', '37.472000122070312', '19.902999877929688'], ['C', '38.854000091552734', '36.187999725341797', '20.687000274658203'], ['N', '39.763000488281250', '36.310001373291016', '21.648000717163086'], ['O', '38.219001770019531', '35.167999267578125', '20.444999694824219'], ['N', '39.077999114990234', '36.497001647949219', '17.204999923706055'], ['C', '39.536998748779297', '35.640998840332031', '16.111999511718750'], ['C', '39.734001159667969', '34.175998687744141', '16.517999649047852'], ['O', '39.173999786376953', '33.270000457763672', '15.869000434875488'], ['C', '40.806999206542969', '36.208000183105469', '15.498000144958496'], ['N', '40.123001098632812', '33.959999084472656', '17.767999649047852'], ['C', '40.236000061035156', '32.608001708984375', '18.357000350952148'], ['C', '38.868000030517578', '31.996999740600586', '18.531000137329102'], ['O', '38.570999145507812', '30.833000183105469', '18.173000335693359'], ['C', '40.984001159667969', '32.761001586914062', '19.690000534057617'], ['C', '41.111000061035156', '31.395999908447266', '20.389999389648438'], ['O', '42.296001434326172', '33.273998260498047', '19.424999237060547']] 
    
    len(stored.xyz_full) :  27
    
    

    and the output_xyz.xyz file in .xyz*** format will be written :

    27
    test
    N 32.504001617431641 37.205001831054688 19.346000671386719
    C 33.805999755859375 36.685001373291016 19.868999481201172
    C 35.000000000000000 37.215999603271484 19.097000122070312
    O 34.831001281738281 37.956001281738281 18.111999511718750
    ..............
    

    this is funny because the original test.pdb had only three decimal places in its cartesian coordinates .

    *** still wasnt able to figure out if .xyz format uses space delimited column for the x , y, z values of it there is a fixed column size for them.

    EDITED BIS

    After a little bit of googling I finally ended up at :

    XYZ cartesian coordinates format (xyz) Openbabel

    where is written :

    On output, the first line written is the number of atoms in the molecule (warning - the number of digits is limited to three for some programs, e.g. Maestro). Line two is the title of the molecule or the filename if no title is defined. Remaining lines define the atoms in the file. The first column is the atomic symbol (right-aligned on the third character), followed by the XYZ coordinates in “10.5” format, in angstroms. This means that all coordinates are printed with five decimal places.

    Example :

    12
    benzene example
      C        0.00000        1.40272        0.00000
      H        0.00000        2.49029        0.00000
      C       -1.21479        0.70136        0.00000
      H       -2.15666        1.24515        0.00000
      C       -1.21479       -0.70136        0.00000
      H       -2.15666       -1.24515        0.00000
      C        0.00000       -1.40272        0.00000
      H        0.00000       -2.49029        0.00000
      C        1.21479       -0.70136        0.00000
      H        2.15666       -1.24515        0.00000
      C        1.21479        0.70136        0.00000
      H        2.15666        1.24515        0.00000
    

    So the part of my code writing the output_xyz.xyz file has to be modify accordingly , while the PyMOL exported XYZ (*.xyz) file seems to carry problems too

    Use :

    cmd.iterate_state("-1" , "test" , "stored.xyz_full.append([elem.rjust(3 , ' ') , f'{x:15.5f}' , f'{y:15.5f}' , f'{z:15.5f}'])")

    to get output_xyz.xyz as :

    27
    test
      N        32.50400        37.20500        19.34600
      C        33.80600        36.68500        19.86900
      C        35.00000        37.21600        19.09700
      O        34.83100        37.95600        18.11200
      C        33.78000        35.13600        19.85800
    ........
    ......
    ....