pythonpandasdataframe

How to turn rows in each group in dataframe into columns?


I have a dataframe:

col1  col2  col3   val  col4
a1    b1     c1    10    dd
a1    b1     c1    15    kk
a2    b2     c2    20    ff
a2    b2     c2    35    mm
a3    b3     c3    9     sd

I want to put each value in column "val" from each group col1, col2, col3 into each column. So desired result is:

col1  col2  col3   val_1  col4_1  val_2  col4_2
a1    b1     c1     10     dd      15      kk
a2    b2     c2     20     ff      35      mm
a3    b3     c3     9      sd      NA      NA

How to do that? Is there any function to turn those rows into columns within group?

There can only be one or two rows per group.


Solution

  • If there exist at most two rows per group, then you can make use of the first and last functions in combination with a groupby.

    Define your own last function that returns the last element of a group (for your example it would be the second value) if it exists. If it does not exist, meaning the group length equals 1, it would return you nan.

    Code:

    def last_check(x):
        return pd.NA if len(x) == 1 else x.iloc[-1]
    
    (df.groupby(["col1", "col2", "col3"], as_index=False)
       .agg(val_1=("val", "first"),
            col4_1=("col4", "first"),
            val_2=("val", last_check),
            col4_2=("col4", last_check)))
    

    Output:

    col1  col2  col3  val_1  col4_1  val_2  col4_2          
    a1    b1    c1    10     dd      15      kk
    a2    b2    c2    20     ff      35      mm
    a3    b3    c3    9      sd      <NA>   <NA>
    

    Setup for reference:

    df = pd.DataFrame(
        {"col1": ["a1", "a1", "a2", "a2", "a3"],
         "col2": ["b1", "b1", "b2", "b2", "b3"],  
         "col3": ["c1", "c1", "c2", "c2", "c3"],
         "val": [10, 15, 20, 35, 9],
         "col4": ["dd", "kk", "ff", "mm", "sd"]})