python-3.xpandaspicklerdkitsdf

RDKit PandasTools WriteSDF: RuntimeError: Bad pickle format: unexpected End-of-File while reading


I face the error:

PandasTools.WriteSDF(pp, args.output_file, molColName='ID', properties=list(pp.columns))
  File "/scratch/micromamba/envs/biotools_py39/lib/python3.9/site-packages/rdkit/Chem/PandasTools.py", line 440, in WriteSDF
    mol = Chem.Mol(row[1][molColName])
RuntimeError: Bad pickle format: unexpected End-of-File while reading

I updated pandas == 2.0.0 as here but it still errored.

Please help me to solve it.

My code here:

import pandas as pd
from pprint import pprint
from rdkit.Chem import PandasTools
from rdkit import Chem
from rdkit.Chem import AllChem

pp = pd.read_csv(args.input_file)
PandasTools.AddMoleculeColumnToFrame(pp,'smiles')
pp["Mol_H"] = pp["ROMol"].apply(Chem.AddHs)
pp["Mol_H"].map(AllChem.EmbedMolecule)
pprint(pp)
PandasTools.WriteSDF(pp, args.output_file, molColName='ID', properties=list(pp.columns))

When I use pandas instead of dask, it throws another error:

PandasTools.WriteSDF(pp, args.output_file, molColName='ID', properties=list(pp.columns))
  File "/scratch/micromamba/envs/biotools_py39/lib/python3.9/site-packages/rdkit/Chem/PandasTools.py", line 440, in WriteSDF
    mol = Chem.Mol(row[1][molColName])
RuntimeError: Bad pickle format: bad endian ID or invalid file format

Configuration (please complete the following information):

Additional context My csv file:

ID,smiles
1,O1CC[C@@H](NC(=O)[C@@H](Cc2cc3cc(ccc3nc2N)-c2ccccc2C)C)CC1(C)C
2,Fc1cc(cc(F)c1)C[C@H](NC(=O)[C@@H](N1CC[C@](NC(=O)C)(CC(C)C)C1=O)CCc1ccccc1)[C@H](O)[C@@H]1[NH2+]C[C@H](OCCC)C1
3,S1(=O)(=O)N(c2cc(cc3c2n(cc3CC)CC1)C(=O)N[C@H]([C@H](O)C[NH2+]Cc1cc(OC)ccc1)Cc1ccccc1)C
4,S1(=O)(=O)C[C@@H](Cc2cc(O[C@H](COCC)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1
5,S1(=O)(=O)N(c2cc(cc3c2n(cc3CC)CC1)C(=O)N[C@H]([C@H](O)C[NH2+]Cc1cc(ccc1)C(F)(F)F)Cc1ccccc1)C
6,S1(=O)C[C@@H](Cc2cc(OC(C(F)(F)F)C(F)(F)F)c(N)c(F)c2)[C@H](O)[C@@H]([NH2+]Cc2cc(ccc2)C(C)(C)C)C1
7,S(=O)(=O)(CCCCC)C[C@@H](NC(=O)c1cccnc1)C(=O)N[C@H]([C@H](O)C[NH2+]Cc1cc(ccc1)CC)Cc1cc(F)cc(F)c1
8,Fc1c2c(ccc1)[C@@]([NH+]=C2N)(C=1C=C(C)C(=O)N(C=1)CC)c1cc(ccc1)-c1cc(cnc1)C#CC
9,O1c2c(cc(cc2)CC)[C@@H]([NH2+]C[C@@H](O)[C@H]2NC(=O)C=3C=CC(=O)N(CCCCc4cc(C2)ccc4)C=3)CC12CCC2
10,O=C1N(CCCC1)C(C)(C)[C@@H]1C[C@@H](CCC1)C(=O)N[C@H]([C@H](O)C[NH2+]Cc1cc(ccc1)C(C)C)Cc1ccccc1

Solution

  • Change to PandasTools.WriteSDF(df, args.output_file, idName='AQID', properties=list(df.columns)) to fix this error.