pythonpandaspivot-tabletranspose

How to convert rows into columns but only for part of a table in python?


I have a table which has the same structure as the below simplified example:

enter image description here

I would like to convert Columns 1 and 2 into column headers with values in the columns being found in C1 value and C2 value. The rest of the table needs to remain unchanged. I expect the outcome to look like the below:

enter image description here

I found a way to do it by using pd.pivot table as shown below but I have ten of the table columns to convert so I want to do it in one go instead of like below which means I have to run the same line ten times, and each time it results in a new separate table which i then have to merge onto the previous output. Is this possible, I tried using a list in the columns and values of pd.pivot_table but it does not work right as it tries to combine the columns I give to the list instead of just transposing them. Is there a way I can transpose part of the table the way i described in one go when using pivot table or is there some other way that would allow me to do it ?

TABLE1=pd.DataFrame({'Name':['John','Niomi','Jack','William','Bob','Stephanie'],
     'Date Added':['05/05/2020','05/05/2020','03/04/2020','01/03/2020','10/04/2020','10/04/2020'],
     'Column 1':['A','B','C','C','A','B'],
     'C1 Value':['1','2','2','3','5','1'],
     'Column 2':['D','D','G','F','G','F'],
     'C2 Value':['5','7','9','5','2','1'],
     'Column 3':['VALUE1','VALUE2','VALUE3','VALUE4','VALUE5','VALUE6'],
     'Column 4':['VALUE7','VALUE8','VALUE9','VALUE10','VALUE11','VALUE12']})

table2=pd.pivot_table(TABLE1,index=('Name','Date Added'),columns='Column1',values='C1 Value',aggfunc='max',fill_value=0)

table3=pd.pivot_table(TABLE1,index=('Name','Date Added'),columns='Column2',values='C2 Value',aggfunc='max',fill_value=0)

table1.merge(table2,on=['Name','Date Added'])

Solution

  • The assumption here is that the tables have the same exact columns, so you can morph this into a function and apply to each one: also, the speed is about half of your solution with the pivot table.

    def reshape(df):
            #get various variables that will be reused
            other = ['Name','Date Added','Column 3','Column 4']
            #contain column 1 and c1 value
            var1 = df.columns[df.columns.str.contains('1')].tolist()
            #contain column 2 and c2 value
            var2 = df.columns[df.columns.str.contains('2')].tolist()
            #we'll use this to replace column 1,2 and c1,2
            #allows us to merge them
            repl = ['header','vals']
                          #set indices on both var1 and 2, and concat
            res = (pd.concat([df.set_index(var1).rename_axis(index=repl).filter(other),
                              df.set_index(var2).rename_axis(index=repl).filter(other)]
                             )
                   #add the remaining indices
                   .set_index(other,append=True)
                   #pull out the values
                   .reset_index(1)
                   #unstack the header column and fill nulls
                   .unstack(0,fill_value=0)
                  )
    
            return res
    
    df.pipe(reshape)
    
                                                         vals
                                        header     A    B   C   D   F   G
    Name         Date Added Column 3    Column 4                        
    Bob         10/04/2020  VALUE5      VALUE11    5    0   0   0   0   2
    Jack        03/04/2020  VALUE3      VALUE9     0    0   2   0   0   9
    John        05/05/2020  VALUE1      VALUE7     1    0   0   5   0   0
    Niomi       05/05/2020  VALUE2      VALUE8     0    2   0   7   0   0
    Stephanie   10/04/2020  VALUE6      VALUE12    0    1   0   0   1   0
    William     01/03/2020  VALUE4      VALUE10    0    0   3   0   5   0