pythonpandasnumpypandas-explode

How to implode(reverse of pandas explode) based on a column


I have a dataframe df like below

  NETWORK       config_id       APPLICABLE_DAYS  Case    Delivery  
0   Grocery     5399            SUN               10       1        
1   Grocery     5399            MON               20       2       
2   Grocery     5399            TUE               30       3        
3   Grocery     5399            WED               40       4       

I want to implode( combine Applicable_days from multiple rows into single row like below) and get the average case and delivery per config_id

  NETWORK       config_id       APPLICABLE_DAYS      Avg_Cases    Avg_Delivery 
0   Grocery     5399            SUN,MON,TUE,WED         90           10

using the groupby on network,config_id i can get the avg_cases and avg_delivery like below.

df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'})

But How do i be able to join APPLICABLE_DAYS while performing this aggregation?


Solution

  • If you want the "opposite" of explode, then that means bringing it into a list in Solution #1. You can also join as a string in Solution #2:

    Use lambda x: x.tolist() for the 'APPLICABLE_DAYS' column within your .agg groupby function:

    df = (df.groupby(['NETWORK','config_id'])
          .agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'})
          .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
          .reset_index())
    df
    Out[1]: 
       NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
    0  Grocery       5399  [SUN, MON, TUE, WED]         25           2.5
    

    Use lambda x: ",".join(x) for the 'APPLICABLE_DAYS' column within your .agg groupby function:

     df = (df.groupby(['NETWORK','config_id'])
          .agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'})
          .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
          .reset_index())
    df
    Out[1]: 
       NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
    0  Grocery       5399       SUN,MON,TUE,WED         25           2.5
    

    If you are looking for the sum, then you can just change mean to sum for the Cases and Delivery columns.