I have a dataframe:
col1 col2 col3 val col4
a1 b1 c1 10 dd
a1 b1 c1 15 kk
a2 b2 c2 20 ff
a2 b2 c2 35 mm
a3 b3 c3 9 sd
I want to put each value in column "val" from each group col1, col2, col3 into each column. So desired result is:
col1 col2 col3 val_1 col4_1 val_2 col4_2
a1 b1 c1 10 dd 15 kk
a2 b2 c2 20 ff 35 mm
a3 b3 c3 9 sd NA NA
How to do that? Is there any function to turn those rows into columns within group?
There can only be one or two rows per group.
If there exist at most two rows per group, then you can make use of the first
and last
functions in combination with a groupby
.
Define your own last
function that returns the last element of a group (for your example it would be the second value) if it exists. If it does not exist, meaning the group length equals 1
, it would return you nan
.
Code:
def last_check(x):
return pd.NA if len(x) == 1 else x.iloc[-1]
(df.groupby(["col1", "col2", "col3"], as_index=False)
.agg(val_1=("val", "first"),
col4_1=("col4", "first"),
val_2=("val", last_check),
col4_2=("col4", last_check)))
Output:
col1 col2 col3 val_1 col4_1 val_2 col4_2
a1 b1 c1 10 dd 15 kk
a2 b2 c2 20 ff 35 mm
a3 b3 c3 9 sd <NA> <NA>
Setup for reference:
df = pd.DataFrame(
{"col1": ["a1", "a1", "a2", "a2", "a3"],
"col2": ["b1", "b1", "b2", "b2", "b3"],
"col3": ["c1", "c1", "c2", "c2", "c3"],
"val": [10, 15, 20, 35, 9],
"col4": ["dd", "kk", "ff", "mm", "sd"]})