[SOLVED] How to turn rows in each group in dataframe into columns?

How to turn rows in each group in dataframe into columns?

I have a dataframe:

col1  col2  col3   val  col4
a1    b1     c1    10    dd
a1    b1     c1    15    kk
a2    b2     c2    20    ff
a2    b2     c2    35    mm
a3    b3     c3    9     sd

I want to put each value in column "val" from each group col1, col2, col3 into each column. So desired result is:

col1  col2  col3   val_1  col4_1  val_2  col4_2
a1    b1     c1     10     dd      15      kk
a2    b2     c2     20     ff      35      mm
a3    b3     c3     9      sd      NA      NA

How to do that? Is there any function to turn those rows into columns within group?

There can only be one or two rows per group.

Solution

If there exist at most two rows per group, then you can make use of the first and last functions in combination with a groupby.

Define your own last function that returns the last element of a group (for your example it would be the second value) if it exists. If it does not exist, meaning the group length equals 1, it would return you nan.

Code:

def last_check(x):
    return pd.NA if len(x) == 1 else x.iloc[-1]

(df.groupby(["col1", "col2", "col3"], as_index=False)
   .agg(val_1=("val", "first"),
        col4_1=("col4", "first"),
        val_2=("val", last_check),
        col4_2=("col4", last_check)))

Output:

col1  col2  col3  val_1  col4_1  val_2  col4_2          
a1    b1    c1    10     dd      15      kk
a2    b2    c2    20     ff      35      mm
a3    b3    c3    9      sd      <NA>   <NA>

Setup for reference:

df = pd.DataFrame(
    {"col1": ["a1", "a1", "a2", "a2", "a3"],
     "col2": ["b1", "b1", "b2", "b2", "b3"],  
     "col3": ["c1", "c1", "c2", "c2", "c3"],
     "val": [10, 15, 20, 35, 9],
     "col4": ["dd", "kk", "ff", "mm", "sd"]})