pythonpandasdataframereindex

Add different prefix to index by column value in pandas dataframe


I have a dataframe, with duplicated index, which I'm trying to reindex according to rowname of another dataframe. However, due to the duplicated index problem it's telling me cannot reindex on an axis with duplicate labels when I do df2 = df2.reindex(df.index).

Below is what my df2 looks like

index column1 column2
A sample_a 1
B sample_a 3
B sample_b 2
C sample_c 3

So I want to make my index unique by adding prefix to it by values from column1. If column1 is sample_a, I want to make index as a_index; if column1 is sample_b, I want to change index as b_index, to change my df2 as below:

index column1 column2
a_A sample_a 1
a_B sample_a 3
b_B sample_b 2
c_C sample_c 3

I do see ways of adding prefix to index but I'm not sure how to selectively add different prefix?

Thank you!


Solution

  • If your other DataFrame also has both "index" and "column1", why not use both as index?

    Alternatively, you can de-duplicate "index" using str.extract to get the id from "column1" that is after the underscore:

    df['index'] = (df['column1']
                     .str.extract('_([^_]+)$', expand=False)
                     .add('_'+df['index'])
                  )
    

    Updated dataframe:

      index   column1  column2
    0   a_A  sample_a        1
    1   a_B  sample_a        3
    2   b_B  sample_b        2
    3   c_C  sample_c        3
    

    regex:

    _        # match underscore
    ([^_]+)  # capture non-underscore characters
    $        # match end of line
    

    regex demo