I am trying to use RDKit to enumerate large libraries of compounds and output the result as a single column of SMILES strings in a CSV file. I was able to use the following code successfully:
import os
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
prods = AllChem.EnumerateLibraryFromReaction(rxn,[rct1,rct2])
prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
import csv
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
for item in prods2:
writer.writerow([item])
However, memory usage was very high. In an attempt to reduce memory usage, I tried to perform an iterative enumeration, where I would take one molecule at a time of "reactants_1", react it with all the molecules in "reactants_2", write the resulting compounds to the CSV file, then iterate:
import os
import csv
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
with open('output.csv', 'w', newline='') as f:
for compound in rct1:
prods = AllChem.EnumerateLibraryFromReaction(rxn,[compound,rct2])
prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
writer = csv.writer(f)
for item in prods2:
writer.writerow([item])
However, in this case I get the following error for the line "prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]": "TypeError: 'Mol' object is not iterable". I was able to iterate over the 'Mol' object without issue in the first instance. Any ideas as to how I might solve this issue, or alternatively, any other ways I could drastically lower the RAM usage when enumerating a large compound set?
EnumerateLibraryFromReaction
expects a list
.
So this should work:
import os
import csv
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
with open('output.csv', 'w', newline='') as f:
for compound in rct1:
compound = [compound] # put the mol into a list
prods = AllChem.EnumerateLibraryFromReaction(rxn,[compound,rct2])
prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
writer = csv.writer(f)
for item in prods2:
writer.writerow([item])