So I'm working with RDKit and Python to convert SMILES strings to ECFP4 fingerprints, and my code is as shown below. I got an error, but I have also checked with this question over here but I seem to have the correct code? But why am I still getting an error?
Is there an alternative way to code this?
bits = 1024
PandasTools.AddMoleculeColumnToFrame(data, smilesCol='SMILES')
data_ECFP4 = [AllChem.GetMorganFingerprintAsBitVect(x, 3, nBits = bits) for x in data['ROMol']]
data_ecfp4_lists = [list(l) for l in data_ECFP4]
ecfp4_name = [f'B{i+1}' for i in range(1024)]
data_ecfp4_df = pd.DataFrame(data_ecfp4_lists, index = data.TARGET, columns = ecfp4_name)
The error I got is:
ArgumentError: Python argument types in rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect(NoneType, int) did not match C++ signature: GetMorganFingerprintAsBitVect(class RDKit::ROMol mol, int radius, unsigned int nBits=2048, class boost::python::api::object invariants=[], class boost::python::api::object fromAtoms=[], bool useChirality=False, bool useBondTypes=True, bool useFeatures=False, class boost::python::api::object bitInfo=None, bool includeRedundantEnvironments=False)
import pandas as pd
from rdkit import Chem
from rdkit.Chem import PandasTools
df = pd.read_csv('file.csv')
PandasTools.AddMoleculeColumnToFrame(df, "SMILES")
df = df[~df['ROMol'].isnull()]
df.to_csv('new_file.csv')