pythonrdkit

RDkit removes explicit Hydrogens while optimizing geometry


I'm trying to optimize a bunch of molecules from an SDF file using AllChem.MMFFOptimizeMolecule. My molecules already have explicit hydrogens, but the warning told me:

Molecule does not have explicit Hs. Consider calling AddHs()

So I decided that it won't hurt to add the AddHs() command just in case. But even after adding that, the warning persists. Not only that, but when I write out the optimized molecules to a new SDF file, the hydrogens are missing.

So it seems that RDKit is removing my explicit hydrogens when reading in the SDF file using Chem.SDMolSupplier(path), not adding hydrogens when I call Chem.AddHs(mol), doing the optimization without them and then writing the opt geometries without hydrogens to the new file and I do not understand what I did to deserve this...

Here is my code:

from rdkit import Chem
from rdkit.Chem import AllChem



def opt_sdf_coords(path='all_moles.sdf',outpath='all_moles_opt.sdf'):

    suppl = Chem.SDMolSupplier(path)
    w = Chem.SDWriter(outpath)

    for mol in suppl:
        Chem.AddHs(mol)
        AllChem.MMFFOptimizeMolecule(mol)

        w.write(mol)

    w.close()

This is an excerpt from the SDF file so that you can run the code yourself:


 OpenBabel06232316383D

  9  9  0  0  0  0  0  0  0  0999 V2000
    0.9772    0.0715   -0.0147 Br  0  0  0  0  0  0  0  0  0  0  0  0
    2.8573    0.0846   -0.0063 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.6483    1.2076   -0.0051 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9310    0.7321    0.0017 N   0  0  0  0  0  0  0  0  0  0  0  0
    4.8622   -0.6331    0.0043 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.6184   -1.0534   -0.0005 N   0  0  0  0  0  0  0  0  0  0  0  0
    3.4204    2.2636   -0.0083 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.7787    1.2827    0.0044 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.7442   -1.2604    0.0095 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  2  6  1  0  0  0  0
  3  4  1  0  0  0  0
  3  7  1  0  0  0  0
  4  5  1  0  0  0  0
  4  8  1  0  0  0  0
  5  6  2  0  0  0  0
  5  9  1  0  0  0  0
M  END
$$$$

 OpenBabel06232316383D

  9  9  0  0  0  0  0  0  0  0999 V2000
    0.9772    0.0715   -0.0147 Br  0  0  0  0  0  0  0  0  0  0  0  0
    2.8573    0.0846   -0.0063 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.6483    1.2076   -0.0051 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9310    0.7321    0.0017 N   0  0  0  0  0  0  0  0  0  0  0  0
    4.8622   -0.6331    0.0043 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.6184   -1.0534   -0.0005 N   0  0  0  0  0  0  0  0  0  0  0  0
    3.4204    2.2636   -0.0083 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.7787    1.2827    0.0044 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.7442   -1.2604    0.0095 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  2  6  1  0  0  0  0
  3  4  1  0  0  0  0
  3  7  1  0  0  0  0
  4  5  1  0  0  0  0
  4  8  1  0  0  0  0
  5  6  2  0  0  0  0
  5  9  1  0  0  0  0
M  END
$$$$

Solution

  • RDKit removes Hs when reading SDF files. That's why it calls for AddHs.

    There is no need to AddHs. Just set removeHs=False.

    from rdkit import Chem
    from rdkit.Chem import AllChem
    
    def opt_sdf_coords(path='all_moles.sdf',outpath='all_moles_opt.sdf'):
    
        suppl = Chem.SDMolSupplier(path, removeHs=False)
        w = Chem.SDWriter(outpath)
    
        for mol in suppl:
            AllChem.MMFFOptimizeMolecule(mol)
    
            w.write(mol)
    
        w.close()