pythonchemistryrdkitcheminformatics

How to separate a list of molecules based on how many hydrogens are attached to a certain atom?


I have alkene molecules of formula C9H17B. How can I separate these molecules into three classes, one being the class that has C-B-H2, one that has C2-B-H and one that has C3-B. How would I do this? I've tried using smiles and also as mol but my approaches aren't working.


Solution

  • To find specific substructures use SMARTS.

    https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

    If I see it correctly these are the three types of boron you are looking for.

    from rdkit import Chem
    from rdkit.Chem import Draw
    
    smiles = ['CCB', 'CCBC', 'CCB(C)(C)']
    
    mols = [Chem.MolFromSmiles(s) for s in smiles]
    Draw.MolsToGridImage(mols)
    

    enter image description here

    Write SMARTS for boron with three connections BX3 and number of hydrogen H2, H1, H0.

    smarts = ['[BX3;H2]', '[BX3;H1]', '[BX3;H0]']
    patts = [Chem.MolFromSmarts(s) for s in smarts]
    

    Now you can proof for substructure in each molecule.

    for p in patts:
        for m in mols:
            print(m.HasSubstructMatch(p))
        print()
    
    True
    False
    False
    
    False
    True
    False
    
    False
    False
    True