I have a dataframe that looks like below
A B C D E
0 Orange Dad's X Eyes 3d. Navy
1 pink. Mum's Bored. ooo. NaN
2 Yellow NaN Sad Gray NaN
I'm trying to remove punctuations in every column in the dataframe using for loop
import string
string.punctuation
#defining the function to remove punctuation
def remove_punctuation(text):
punctuationfree="".join([i for i in text if i not in string.punctuation])
return punctuationfree
#storing the puntuation free text
col=['A','B','C','D','E']
for i in col:
df[i].apply(lambda x:remove_punctuation(x))
But I get
"TypeError Traceback (most recent call last)
/var/folders/jd/lln92nb4p01g8grr0000gn/T/ipykernel_24651/2417883.py in <module>
12
13 for i in col:
---> 14 df[i].apply(lambda x:remove_punctuation(x))
TypeError: 'float' object is not iterable"
Can anyone help me on this please? Any help would be greatly appreciated!
You are getting the error because of NaN
values, try to check for NaN
upfront:
def remove_punctuation(text):
if pd.isna(text):
return text
punctuationfree="".join([i for i in text if i not in string.punctuation])
return punctuationfree
for c in df:
df[c] = df[c].apply(remove_punctuation)
OUTPUT
# df
A B C D E
0 Orange Dads X Eyes 3d Navy
1 pink Mums Bored ooo NaN
2 Yellow NaN Sad Gray NaN