rdkitcheminformatics

Compound classification using RDkit


How to classify compound computationally using RDkit or other libraries? For example, how to tell if a compound is a halide, Amine or Alcohol? Does RDkit have build in functions for this kind of task?


Solution

  • There's no straightforward way to do that but there are some hacks you can do to classify the compounds. There's a module in rdkit that can provide you the number of fragments especially when it's a function group. Check it out here. As an example, let's say you want to find the number of aliphatic -OH groups in your molecule. You can simply call the following function to do that

    from rdkit.Chem.Fragments import fr_Al_OH
    fr_Al_OH(mol)
    

    or the following would return the number of aromatic -OH groups:

    from rdkit.Chem.Fragments import fr_Ar_OH
    fr_Ar_OH(mol)
    

    Similarly, there are 83 more functions available. Some of them would be useful for your task. So you can just iterate over all the 83 functions and whenever the value is greater than or equal to 1, then you can say that the molecule has that functional group. As an example, if fr_Al_OH(mol) returns a value of >= 1, then that means the compound is an alcohol.