i am not sure if, and how i can do the following transformation:
I have a DataFrame looking like this:
Index Name detail1 detail1_value detail2 detail2_value detail3 detail3_value
1 Albert Age 30 Group A Hometown beautifulplace
2 Bea Age 28 Hometown anotherplace None None
3 Celin Age 45 Group B None None
4 Dave Group A None None None None
But as you can imagine, my aim is:
Index Name Age Group Hometown
1 Albert 30 A beautifulplace
2 Bea 28 anotherplace
3 Celin 45 B
4 Dave A
I am pretty sure that ech detail just appears once. To keep things complex: i am not sure if every detail is total identical (in some cases for example Hometowns instead of Hometown).
The only solution i can see so far is to produce singel pivot-tables out of each pair of columns (like detail1 and detail1_value). In a second step an new dataset is created and each of these pivot-tables is searched for example on the Information on the Age. But my trust in python tells me, that there must be a better way...
Thanks!
PS: May help:
dataset = pd.DataFrame({'Name': ['Albert', 'Bea', 'Celine', 'Dave'],
'detail1': ['Age', 'Age', 'Age', 'Group'],
'detail1_value': ['30', '28', '45', 'A'],
'detail2': ['Group', 'Hometown', 'Group', None],
'detail2_value': ['A', 'anotherplace', 'B', None],
'detail3': ['Hometown', None, None, None],
'detail3_value': ['beautifulplace', None, None, None]})
You can use lreshape
with pivot
:
#get columns names dynamically
a = dataset.columns[dataset.columns.str.endswith('_value')]
b = dataset.columns[dataset.columns.str.startswith('detail')].difference(a)
df = pd.lreshape(dataset, {'detail':b, 'value':a})
print (df)
Name value detail
0 Albert 30 Age
1 Bea 28 Age
2 Celine 45 Age
3 Dave A Group
4 Albert A Group
5 Bea anotherplace Hometown
6 Celine B Group
7 Albert beautifulplace Hometown
df = df.pivot(index='Name', columns='detail', values='value')
print (df)
detail Age Group Hometown
Name
Albert 30 A beautifulplace
Bea 28 None anotherplace
Celine 45 B None
Dave None A None
Some data cleaning last:
df = df.reset_index().rename_axis(None, axis=1)
print (df)
Name Age Group Hometown
0 Albert 30 A beautifulplace
1 Bea 28 None anotherplace
2 Celine 45 B None
3 Dave None A None