pythonpandaspivotpivot-tablelreshape

Pandas: Multiple columns containing Names and Values of Variables: How to use Pivot?


i am not sure if, and how i can do the following transformation:

I have a DataFrame looking like this:

Index   Name    detail1 detail1_value   detail2  detail2_value   detail3    detail3_value
1     Albert    Age      30             Group       A            Hometown   beautifulplace
2     Bea       Age      28             Hometown    anotherplace None       None
3     Celin     Age      45             Group       B            None       None
4     Dave      Group    A              None        None         None       None

But as you can imagine, my aim is:

Index   Name    Age Group   Hometown
1     Albert    30  A   beautifulplace
2     Bea       28      anotherplace
3     Celin     45  B   
4     Dave          A   

I am pretty sure that ech detail just appears once. To keep things complex: i am not sure if every detail is total identical (in some cases for example Hometowns instead of Hometown).

The only solution i can see so far is to produce singel pivot-tables out of each pair of columns (like detail1 and detail1_value). In a second step an new dataset is created and each of these pivot-tables is searched for example on the Information on the Age. But my trust in python tells me, that there must be a better way...

Thanks!

PS: May help:

dataset = pd.DataFrame({'Name': ['Albert', 'Bea', 'Celine', 'Dave'],
                        'detail1': ['Age', 'Age', 'Age', 'Group'],
                        'detail1_value': ['30', '28', '45', 'A'],
                        'detail2': ['Group', 'Hometown', 'Group', None],
                        'detail2_value': ['A', 'anotherplace', 'B', None],
                        'detail3': ['Hometown', None, None, None],
                        'detail3_value': ['beautifulplace', None, None, None]})

Solution

  • You can use lreshape with pivot:

    #get columns names dynamically 
    a = dataset.columns[dataset.columns.str.endswith('_value')]
    b = dataset.columns[dataset.columns.str.startswith('detail')].difference(a)
    
    df = pd.lreshape(dataset, {'detail':b, 'value':a})
    print (df)
         Name           value    detail
    0  Albert              30       Age
    1     Bea              28       Age
    2  Celine              45       Age
    3    Dave               A     Group
    4  Albert               A     Group
    5     Bea    anotherplace  Hometown
    6  Celine               B     Group
    7  Albert  beautifulplace  Hometown
    
    
    df = df.pivot(index='Name', columns='detail', values='value')
    print (df)
    detail   Age Group        Hometown
    Name                              
    Albert    30     A  beautifulplace
    Bea       28  None    anotherplace
    Celine    45     B            None
    Dave    None     A            None
    

    Some data cleaning last:

    df = df.reset_index().rename_axis(None, axis=1)
    print (df)
         Name   Age Group        Hometown
    0  Albert    30     A  beautifulplace
    1     Bea    28  None    anotherplace
    2  Celine    45     B            None
    3    Dave  None     A            None