I have a table which has the same structure as the below simplified example:
I would like to convert Columns 1 and 2 into column headers with values in the columns being found in C1 value and C2 value. The rest of the table needs to remain unchanged. I expect the outcome to look like the below:
I found a way to do it by using pd.pivot table as shown below but I have ten of the table columns to convert so I want to do it in one go instead of like below which means I have to run the same line ten times, and each time it results in a new separate table which i then have to merge onto the previous output. Is this possible, I tried using a list in the columns and values of pd.pivot_table but it does not work right as it tries to combine the columns I give to the list instead of just transposing them. Is there a way I can transpose part of the table the way i described in one go when using pivot table or is there some other way that would allow me to do it ?
TABLE1=pd.DataFrame({'Name':['John','Niomi','Jack','William','Bob','Stephanie'],
'Date Added':['05/05/2020','05/05/2020','03/04/2020','01/03/2020','10/04/2020','10/04/2020'],
'Column 1':['A','B','C','C','A','B'],
'C1 Value':['1','2','2','3','5','1'],
'Column 2':['D','D','G','F','G','F'],
'C2 Value':['5','7','9','5','2','1'],
'Column 3':['VALUE1','VALUE2','VALUE3','VALUE4','VALUE5','VALUE6'],
'Column 4':['VALUE7','VALUE8','VALUE9','VALUE10','VALUE11','VALUE12']})
table2=pd.pivot_table(TABLE1,index=('Name','Date Added'),columns='Column1',values='C1 Value',aggfunc='max',fill_value=0)
table3=pd.pivot_table(TABLE1,index=('Name','Date Added'),columns='Column2',values='C2 Value',aggfunc='max',fill_value=0)
table1.merge(table2,on=['Name','Date Added'])
The assumption here is that the tables have the same exact columns, so you can morph this into a function and apply to each one: also, the speed is about half of your solution with the pivot table.
def reshape(df):
#get various variables that will be reused
other = ['Name','Date Added','Column 3','Column 4']
#contain column 1 and c1 value
var1 = df.columns[df.columns.str.contains('1')].tolist()
#contain column 2 and c2 value
var2 = df.columns[df.columns.str.contains('2')].tolist()
#we'll use this to replace column 1,2 and c1,2
#allows us to merge them
repl = ['header','vals']
#set indices on both var1 and 2, and concat
res = (pd.concat([df.set_index(var1).rename_axis(index=repl).filter(other),
df.set_index(var2).rename_axis(index=repl).filter(other)]
)
#add the remaining indices
.set_index(other,append=True)
#pull out the values
.reset_index(1)
#unstack the header column and fill nulls
.unstack(0,fill_value=0)
)
return res
df.pipe(reshape)
vals
header A B C D F G
Name Date Added Column 3 Column 4
Bob 10/04/2020 VALUE5 VALUE11 5 0 0 0 0 2
Jack 03/04/2020 VALUE3 VALUE9 0 0 2 0 0 9
John 05/05/2020 VALUE1 VALUE7 1 0 0 5 0 0
Niomi 05/05/2020 VALUE2 VALUE8 0 2 0 7 0 0
Stephanie 10/04/2020 VALUE6 VALUE12 0 1 0 0 1 0
William 01/03/2020 VALUE4 VALUE10 0 0 3 0 5 0