pandasdataframesuffix

How do I use df.add_suffix to add suffixes to duplicate column names in Pandas?


I have a large dataframe with 400 columns. 200 of the column names are duplicates of the first 200. How can I used df.add_suffix to add a suffix only to the duplicate column names?

Or is there a better way to do it automatically?


Solution

  • If I understand your question correct you have each name twice. If so it is possible to ask for duplicated values using df.columns.duplicated(). Then you can create a new list only modifying duplicated values and adding your self definied suffix. This is different from the other posted solution which modifies all entries.

    df = pd.DataFrame(data=[[1, 2, 3, 4]], columns=list('aabb'))
    my_suffix = 'T'
    
    df.columns = [name if duplicated == False else name + my_suffix for duplicated, name in zip(df.columns.duplicated(), df.columns)]
    df
    >>>
       a  aT  b  bT
    0  1   2  3   4
    

    My answer has the disadvantage that the dataframe can have duplicated column names if one name is used three or more times.