I have a dataframe with 6 columns: 'Name', 'A', 'B', 'C', 'Val', 'Category'
It looks like this:
Name A B C Val Category
x 1.1 0 0.2 NA NA
y 0 0.1 0 NA NA
z 0.5 0.1 0.3 NA NA
I want to expand the dataframe such that for each value that is not 0 in columns 'A', 'B', 'C' you get an extra row. The column 'Val' is assigned the non-zero value that led to the expansion and the 'Category' is arbitrarily based on where the value came from.
The result should look like this:
Name A B C Val Category
x 1.1 0 0.2 1.1 first
x 1.1 0 0.2 0.2 third
y 0 0.1 0 0.1 second
z 0.5 0.1 0.3 0.5 fisrt
z 0.5 0.1 0.3 0.1 second
z 0.5 0.1 0.3 0.3 third
This is probably the wrong approach, but I thought since I only have three columns I should be repeating all the rows 3 times by using the repeat function on the index and then looping through the rows based on a for loop with a skip to apply 3 functions to assign the target and AICN all rows and then dropping rows where the target is 0.
def targeta(row):
target = row
val = 'first'
return target, val
def targetb(row):
target = row
val = 'second'
return target, val
def targetc(row):
target = row
val = 'third'
return target, val
df_repeat = df.loc[df.index.repeat(3)]
for i in range(1,len(df_repeat)-3,3):
df_repeat.iloc[i][['Target','Category']]=targeta(df_repeat.iloc[i]['A'])
df_repeat.iloc[i+1][['Target','Category']]=targetb(df_repeat.iloc[i+1]['B'])
df_repeat.iloc[i+2][['Target','Category']]=targetc(df_repeat.iloc[i+2]['C'])
I only got to this point and realized I am getting an empty dataframe. Any suggestions on what to do?
You could replace
the 0s with NaNs, rename
the columns to your categories, reshape to long with stack
, and join
back to the original to duplicate the rows:
out = (df
.drop(columns=['Val', 'Category'])
.join(df[['A', 'B', 'C']]
.set_axis(['first', 'second', 'third'], axis=1)
.rename_axis(columns='Category')
.replace(0, pd.NA)
.stack()
.rename('Val')
.reset_index(-1)
)
)
Output:
Name A B C Category Val
0 x 1.1 0.0 0.2 first 1.1
0 x 1.1 0.0 0.2 third 0.2
1 y 0.0 0.1 0.0 second 0.1
2 z 0.5 0.1 0.3 first 0.5
2 z 0.5 0.1 0.3 second 0.1
2 z 0.5 0.1 0.3 third 0.3