pythonregexpandasdataframe

How to remove non-alpha-numeric characters from strings within a dataframe column?


I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:

df['strings'] = ["a#bc1!","a(b$c"]

Run code:

Print(df['strings']): ['abc','abc']

I've tried:

df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")

But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.


Solution

  • Use str.replace.

    df
      strings
    0  a#bc1!
    1   a(b$c
    
    df.strings.str.replace('[^a-zA-Z]', '')
    0    abc
    1    abc
    Name: strings, dtype: object
    

    To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

    df.strings.str.replace('\W', '')
    0    abc1
    1     abc
    Name: strings, dtype: object