python-3.xpandasreplacenon-alphanumeric

How to drop and keep only certain non alphanumeric characters?


I Have df that looks like this:

email                                    id
{'email': ['test@test.com']}           {'id': ['123abc_d456_789_fgh']}

when I drop non alphanumeric characters like so:

df.email = df.email.str.replace('[^a-zA-Z]', '')
df.email = df.email.str.replace('email', '')


df.id = df.id.str.replace('[^a-zA-Z]', '')
df.id = df.id.str.replace('id', '')

The columns look like this:

email                    id
testtestcom              123abcd456789fgh

How do I tell the code to not drop anything in the square brackets but drop all non alpha numeric characters outside the brackets?

New df should like this:

email                        id
test@test.com                123abc_d456_789_fgh

Solution

  • This is hardcoded, but works:

    df.email = df.email.str.replace(".+\['|'].+", '')
    df.id = df.id.str.replace(".+\['|'].+", '')
    
    >>> 'test@test.com'
    >>> '123abc_d456_789_fgh'