A new deep-learning algorithm for drug-discovery based on images, requires splitting a file containing ~3000 chemical compounds in png files containing individual 2D 200 x 200 pixel images (.: SN00001400.png, SN00002805.png, SN00002441.png........). Not need any conformers, nor any other 3D information.
I could send an initial f1.sdf example containing 9 compound images, names and smiles, one for each compound row.
Using rdkit 2017.09.1 in Anaconda3 with Python 3.6, 3.7 or 3.8, Jupyter notebooks and/or Python prompt, in 2 e7 64 computers within Windows 8 professional, I am looking for a simple Python code to split the images, convert them to a 200 x 200 pixel png file (carios), named them by their corresponding compound ID and save them into a different directory (.: images), ready to be tested.
I try many different web codes and combinations but despite intensive testing, they did not work :-(.
Following some of my best (?) code trials.
rdkit imports tested
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DSVG
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DCairo # cannot import
from rdkit.Chem.Draw import IPythonConsole
from IPython.display import SVG # IPython not in module
from rdkit.Chem import rdDepictor
from rdkit.Chem import MolFromSmiles
Best Test using a unique smiles
IPythonConsole.molSize = (200, 200)
IPythonConsole.ipython_useSVG = True #I would rather use Cairo but I could not make it to work!
mol = Chem.MolFromSmiles('N#Cc1cccc(-c2nc(-c3cccnc3)no2)c1')
display(mol) # not working
AllChem.Compute2DCoords(mol)
I tried different smiles with similar negative results down this line....
IMG_SIZE = 200
smiles="CCCC"
mol = Chem.MolFromSmiles(smiles)
drawer = rdMolDraw2D.MolDraw2DSVG(IMG_SIZE, IMG_SIZE) #MolDraw2D has no attribute MolDraw2DCairo despite cairo being installed!
drawer.drawOptions().bondLineWith = 1
drawer.DrawMolecule(mol) # bad conformer id (?????)
drawer.FinishDrawing()
drawer.WriteDrawingText('comp_id.png')
Best attempts using 9 compounds in f1.sdf
suppl=Chem.SDMolSupplier('f1.sdf')
for mol in suppl:
print(mol.GetName()) # AttributeError: 'Mol' object has no attribute 'GetMolecule_Name'
mols=[x for x in suppl]
Name(mols)
suppl = Chem.SDMolSupplier('f1.sdf')
ms= [x for x in suppl if x is not None]
for m in ms:
tmp=AllChem.Compute2DCoords(m)
Draw.MolToFile(ms[0], 'images/mol1.png') cairo.IOError: error while writing to output stream
Draw.MolToFile(ms[1], 'images/mol2.png')
....................................................................
Hoping to get some help! Thanks for your attention, sincerely Julio
juliocollm@gmail.com
You were right!.
I performed a "conda install -c conda-forge rdkit" in a newly created Anaconda3 environment, and most of the commands suddenly WORKED!!!. THANK YOU VERY MUCH!!!!
I developed the code below..... but I got stopped because I cannot find a way to transfer each of the corresponding comp_id to the names of the png files that code for the beautiful png images. Any ideas? THANKS!!!
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DSVG
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DCairo
from rdkit.Chem.Draw import MolToFile
from rdkit.Chem import rdDepictor
from rdkit.Chem import MolFromSmiles
suppl = Chem.SDMolSupplier('f1.sdf')
for mol in suppl:
print(mol.GetProp("comp_id"))
mols= [x for x in suppl]
for m in mols:
tmp=AllChem.Compute2DCoords(m)
Draw.MolToFile(mols[0],'images/3333.png', size=(200,200), kekulize = True, wedgeBonds = False,imageType=None, fitImage=False, options=None) .......#did not get the comp_id but could transfer some attributes
Draw.MolToFile(mols[1], 'images/'+"comp_id"+'a.png')........#did not get the idea