pythonpandasdataframenumpy

Distinguish repeating column names by adding an integer using pandas


I have some columns that have the same names. I would like to add a 1 to the repeating column names

Data

Date        Type    hi  hello   stat    hi  hello   
1/1/2022    a       0   0       1       1   0

Desired

Date        Type    hi  hello   stat    hi1     hello1  
1/1/2022    a       0   0       1       1       0

Doing

mask = df['col2'].duplicated(keep=False)

I believe I can utilize mask, but not sure how to efficiently achieve this without calling out the actual column. I would like to call the full dataset and allow the algorithm to update the dupe.

Any suggestion is appreciated


Solution

  • New in pandas 2.0

    Use the new built-in io.common.dedup_names():

    df.columns = pd.io.common.dedup_names(df.columns, is_potential_multiindex=False)
    
    #        Date  Type  hi  hello  stat  hi.1  hello.1
    # 0  1/1/2022     a   0      0     1     1        0
    

    For pandas < 2.0

    The previous method was io.parsers.base_parser._maybe_dedup_names():

    df.columns = pd.io.parsers.base_parser.ParserBase({'usecols': None})._maybe_dedup_names(df.columns)
    

    For pandas < 1.3

    The original method was io.parsers._maybe_dedup_names():

    df.columns = pd.io.parsers.ParserBase({})._maybe_dedup_names(df.columns)