pythonstringnlppunctuation

Removing punctuations in dataframe using for loop


I have a dataframe that looks like below

  A        B       C       D      E
0 Orange  Dad's  X Eyes   3d.    Navy
1 pink.   Mum's  Bored.   ooo.   NaN
2 Yellow  NaN    Sad      Gray   NaN

I'm trying to remove punctuations in every column in the dataframe using for loop

import string
string.punctuation

#defining the function to remove punctuation
def remove_punctuation(text):
    punctuationfree="".join([i for i in text if i not in string.punctuation])
    return punctuationfree

#storing the puntuation free text
col=['A','B','C','D','E']

for i in col:
    df[i].apply(lambda x:remove_punctuation(x))

But I get

    "TypeError                                 Traceback (most recent call last)
    /var/folders/jd/lln92nb4p01g8grr0000gn/T/ipykernel_24651/2417883.py in <module>
         12 
         13 for i in col:
    ---> 14     df[i].apply(lambda x:remove_punctuation(x))
      
TypeError: 'float' object is not iterable" 

Can anyone help me on this please? Any help would be greatly appreciated!


Solution

  • You are getting the error because of NaN values, try to check for NaN upfront:

    def remove_punctuation(text):
        if pd.isna(text):
            return text
        punctuationfree="".join([i for i in text if i not in string.punctuation])
        return punctuationfree
    
    for c in df:
        df[c] = df[c].apply(remove_punctuation)
    

    OUTPUT

    # df
              A     B       C     D     E
    0   Orange   Dads  X Eyes    3d  Navy
    1     pink   Mums   Bored   ooo   NaN
    2   Yellow   NaN     Sad  Gray   NaN