pythonpandasdataframenumpyseries

pandas rename multiple columns using regex pattern


I have a dataframe like as shown below

ID,US-Test1,US-Test2,US-Test3
1,11,12,13
2,13,16,18
3,15,19,21

I would like to remove the keyword US - from all my column names

I tried the below but there should be better way to do this

newNames = {
    'US-Test1':'Test1',
    'US-Test2':'Test2'
}
df.rename(columns=newNames,inplace=True)

But my real data has 70 plus columns and this is not efficient.

Any regex approach to rename columns based on regex to exclude the pattern and retain only what I want?

I expect my output to be like as shown below

ID,Test1,Test2,Test3
1,11,12,13
2,13,16,18
3,15,19,21

Solution

  • You could use a regex that matches the "US-" at the beginning like this:

    df.columns = df.columns.str.replace("^US-", "", regex=True)
    

    It replaces the matching "US-" with an empty string.

    Also, if you know the columns that you want to transform you could apply slicing on their names to remove the first 3 characters:

    df.columns = df.columns.str.slice(3)
    

    Of course, this will affect columns that do not match your condition (i.e. do not begin with "US-")