imagepnganaconda3rdkitfilesplitting

How would you convert a large sdf file of chemical compounds into individual files containing molecular images?


A new deep-learning algorithm for drug-discovery based on images, requires splitting a file containing ~3000 chemical compounds in png files containing individual 2D 200 x 200 pixel images (.: SN00001400.png, SN00002805.png, SN00002441.png........). Not need any conformers, nor any other 3D information.

I could send an initial f1.sdf example containing 9 compound images, names and smiles, one for each compound row.

Using rdkit 2017.09.1 in Anaconda3 with Python 3.6, 3.7 or 3.8, Jupyter notebooks and/or Python prompt, in 2 e7 64 computers within Windows 8 professional, I am looking for a simple Python code to split the images, convert them to a 200 x 200 pixel png file (carios), named them by their corresponding compound ID and save them into a different directory (.: images), ready to be tested.

I try many different web codes and combinations but despite intensive testing, they did not work :-(.

Following some of my best (?) code trials.

rdkit imports tested

from rdkit import Chem
from rdkit.Chem import AllChem 
from rdkit.Chem import Draw
from rdkit.Chem.Draw import rdMolDraw2D    
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DSVG    
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DCairo  # cannot import 
from rdkit.Chem.Draw import IPythonConsole  
from IPython.display import SVG # IPython not in module 
from rdkit.Chem import rdDepictor 
from rdkit.Chem import MolFromSmiles

Best Test using a unique smiles

IPythonConsole.molSize = (200, 200)  
IPythonConsole.ipython_useSVG = True  #I would rather use Cairo but I could not make it to work!
mol = Chem.MolFromSmiles('N#Cc1cccc(-c2nc(-c3cccnc3)no2)c1')
display(mol)  # not working
AllChem.Compute2DCoords(mol)

I tried different smiles with similar negative results down this line....

IMG_SIZE = 200
smiles="CCCC"
mol = Chem.MolFromSmiles(smiles)
drawer = rdMolDraw2D.MolDraw2DSVG(IMG_SIZE, IMG_SIZE)  #MolDraw2D has no attribute MolDraw2DCairo despite cairo being installed!   
drawer.drawOptions().bondLineWith = 1
drawer.DrawMolecule(mol)  # bad conformer id (?????)
drawer.FinishDrawing()
drawer.WriteDrawingText('comp_id.png')

Best attempts using 9 compounds in f1.sdf

suppl=Chem.SDMolSupplier('f1.sdf')
for mol in suppl:
    print(mol.GetName()) # AttributeError: 'Mol' object has no attribute 'GetMolecule_Name'
mols=[x for x in suppl]
Name(mols) 

suppl = Chem.SDMolSupplier('f1.sdf')
ms= [x for x in suppl if x is not None]
for m in ms: 
    tmp=AllChem.Compute2DCoords(m)

Draw.MolToFile(ms[0], 'images/mol1.png') cairo.IOError: error while writing to output stream
Draw.MolToFile(ms[1], 'images/mol2.png')

....................................................................

Hoping to get some help! Thanks for your attention, sincerely Julio

juliocollm@gmail.com


Solution

  • You were right!.

    I performed a "conda install -c conda-forge rdkit" in a newly created Anaconda3 environment, and most of the commands suddenly WORKED!!!. THANK YOU VERY MUCH!!!!

    I developed the code below..... but I got stopped because I cannot find a way to transfer each of the corresponding comp_id to the names of the png files that code for the beautiful png images. Any ideas? THANKS!!!

    from rdkit import Chem

    from rdkit.Chem import AllChem

    from rdkit.Chem import Draw

    from rdkit.Chem.Draw import rdMolDraw2D

    from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DSVG

    from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DCairo

    from rdkit.Chem.Draw import MolToFile

    from rdkit.Chem import rdDepictor

    from rdkit.Chem import MolFromSmiles

    suppl = Chem.SDMolSupplier('f1.sdf')

    for mol in suppl:

    print(mol.GetProp("comp_id"))
    

    mols= [x for x in suppl]

    for m in mols:

    tmp=AllChem.Compute2DCoords(m)
    

    Draw.MolToFile(mols[0],'images/3333.png', size=(200,200), kekulize = True, wedgeBonds = False,imageType=None, fitImage=False, options=None) .......#did not get the comp_id but could transfer some attributes

    Draw.MolToFile(mols[1], 'images/'+"comp_id"+'a.png')........#did not get the idea