pythonenumerationchemistryrdkitcheminformatics

RDKit: "TypeError: 'Mol' object is not iterable" when attempting looped enumeration


I am trying to use RDKit to enumerate large libraries of compounds and output the result as a single column of SMILES strings in a CSV file. I was able to use the following code successfully:

import os
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
prods = AllChem.EnumerateLibraryFromReaction(rxn,[rct1,rct2])
prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
import csv
with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    for item in prods2:
        writer.writerow([item])

However, memory usage was very high. In an attempt to reduce memory usage, I tried to perform an iterative enumeration, where I would take one molecule at a time of "reactants_1", react it with all the molecules in "reactants_2", write the resulting compounds to the CSV file, then iterate:

import os
import csv
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
with open('output.csv', 'w', newline='') as f:
    for compound in rct1:
        prods = AllChem.EnumerateLibraryFromReaction(rxn,[compound,rct2])
        prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
        writer = csv.writer(f)
        for item in prods2:
            writer.writerow([item])

However, in this case I get the following error for the line "prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]": "TypeError: 'Mol' object is not iterable". I was able to iterate over the 'Mol' object without issue in the first instance. Any ideas as to how I might solve this issue, or alternatively, any other ways I could drastically lower the RAM usage when enumerating a large compound set?


Solution

  • EnumerateLibraryFromReaction expects a list.

    So this should work:

    import os
    import csv
    os.chdir('xxx')
    from rdkit import Chem
    from rdkit.Chem import rdChemReactions
    from rdkit.Chem import AllChem
    rxn = rdChemReactions.ReactionFromSmarts('xxx')
    rct1 = Chem.SDMolSupplier('reactants_1.sdf')
    rct2 = Chem.SDMolSupplier('reactants_2.sdf')
    with open('output.csv', 'w', newline='') as f:
        for compound in rct1:
            compound = [compound] # put the mol into a list
            prods = AllChem.EnumerateLibraryFromReaction(rxn,[compound,rct2])
            prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
            writer = csv.writer(f)
            for item in prods2:
                writer.writerow([item])