pythonpandasdataframe

Explode Dataframe and add new columns with specific values based on a condition


I have a dataframe with 6 columns: 'Name', 'A', 'B', 'C', 'Val', 'Category'

It looks like this:

Name   A     B     C   Val   Category
 x    1.1   0     0.2  NA    NA
 y    0     0.1   0    NA    NA
 z    0.5   0.1   0.3  NA    NA

I want to expand the dataframe such that for each value that is not 0 in columns 'A', 'B', 'C' you get an extra row. The column 'Val' is assigned the non-zero value that led to the expansion and the 'Category' is arbitrarily based on where the value came from.

The result should look like this:

Name   A    B     C    Val   Category
 x    1.1   0     0.2  1.1   first
 x    1.1   0     0.2  0.2   third
 y    0     0.1   0    0.1   second
 z    0.5   0.1   0.3  0.5   fisrt
 z    0.5   0.1   0.3  0.1   second
 z    0.5   0.1   0.3  0.3   third

This is probably the wrong approach, but I thought since I only have three columns I should be repeating all the rows 3 times by using the repeat function on the index and then looping through the rows based on a for loop with a skip to apply 3 functions to assign the target and AICN all rows and then dropping rows where the target is 0.

def targeta(row):
    target = row
    val = 'first'
    return target, val

def targetb(row):
    target = row
    val = 'second'
    return target, val

def targetc(row):
    target = row
    val = 'third'
    return target, val

df_repeat = df.loc[df.index.repeat(3)]

for i in range(1,len(df_repeat)-3,3):
    df_repeat.iloc[i][['Target','Category']]=targeta(df_repeat.iloc[i]['A'])
    df_repeat.iloc[i+1][['Target','Category']]=targetb(df_repeat.iloc[i+1]['B'])
    df_repeat.iloc[i+2][['Target','Category']]=targetc(df_repeat.iloc[i+2]['C'])

I only got to this point and realized I am getting an empty dataframe. Any suggestions on what to do?


Solution

  • You could replace the 0s with NaNs, rename the columns to your categories, reshape to long with stack, and join back to the original to duplicate the rows:

    out = (df
           .drop(columns=['Val', 'Category'])
           .join(df[['A', 'B', 'C']]
                 .set_axis(['first', 'second', 'third'], axis=1)
                 .rename_axis(columns='Category')
                 .replace(0, pd.NA)
                 .stack()
                 .rename('Val')
                 .reset_index(-1)
                )
           )
    

    Output:

      Name    A    B    C Category  Val
    0    x  1.1  0.0  0.2    first  1.1
    0    x  1.1  0.0  0.2    third  0.2
    1    y  0.0  0.1  0.0   second  0.1
    2    z  0.5  0.1  0.3    first  0.5
    2    z  0.5  0.1  0.3   second  0.1
    2    z  0.5  0.1  0.3    third  0.3