pythonpandasdataframesummarization

Value counts then sum of a different column


I have a dataframe that looks like this:

 Index Store_Nbr    SKU                                             Str_OH_Units_Wkly
    0    0105       1002075704-SMPL GRN OUTDR ODOR ELIMINATOR 32OZ       4
    1    0105       1002772687-SG APC LAVENDER 128OZ                     6
    2    0105       853534-SG PRO HEAVY DUTY 128OZ                       25
    3    0105       854029-SG PRO3PLUS ANTIBAC&DISINFECT 128OZ           4
    4    0106       1002772687-SG APC LAVENDER 128OZ                     7
    5    0106       310408-SG APC SPY LEMON 32OZ                         12
    6    0106       829706-SG HD BBQ AERO 20OZ                           11

I am trying to figure out how many times each SKU is presented in the dataframe, as well as how many Str_OH_Units_Wkly there are for each after getting the value counts.

Starting off by using value_counts():

inv_no_sales_sum['SKU'].value_counts()

Output:

1002075704-SMPL GRN OUTDR ODOR ELIMINATOR 32OZ    1266
1002772687-SG APC LAVENDER 128OZ                   945
854029-SG PRO3PLUS ANTIBAC&DISINFECT 128OZ         891
829706-SG HD BBQ AERO 20OZ                         837
853534-SG PRO HEAVY DUTY 128OZ                     772
309012-SG APC LEMON 128OZ                          726
310408-SG APC SPY LEMON 32OZ                       605
883387-SIMPLE GREEN APC 320OZ                      422
435909-SG APC CONCEN SPY 32OZ                      276
431429-SG APC 128OZ                                 53
Name: SKU, dtype: int64

I would then like sum Str_OH_Units_Wkly next to the value counts. How would I go about appending that summation to the unique value_counts()? To look like the following:

1002075704-SMPL GRN OUTDR ODOR ELIMINATOR 32OZ    1266  [sum of Str_OH_Units_Wkly]
1002772687-SG APC LAVENDER 128OZ                   945  [sum of Str_OH_Units_Wkly]
854029-SG PRO3PLUS ANTIBAC&DISINFECT 128OZ         891  [sum of Str_OH_Units_Wkly]
829706-SG HD BBQ AERO 20OZ                         837  [sum of Str_OH_Units_Wkly]
853534-SG PRO HEAVY DUTY 128OZ                     772  [sum of Str_OH_Units_Wkly]
309012-SG APC LEMON 128OZ                          726  [sum of Str_OH_Units_Wkly]
310408-SG APC SPY LEMON 32OZ                       605  [sum of Str_OH_Units_Wkly]
883387-SIMPLE GREEN APC 320OZ                      422  [sum of Str_OH_Units_Wkly]
435909-SG APC CONCEN SPY 32OZ                      276  [sum of Str_OH_Units_Wkly]
431429-SG APC 128OZ                                 53  [sum of Str_OH_Units_Wkly]
Name: SKU, dtype: int64

Any help would be greatly appreciated!

Thanks.


Solution

  • # use agg to get both the count and sum
    (df.groupby('SKU' )['Str_OH_Units_Wkly']
      .agg(count='count', 
           sum='sum')
      .reset_index()
    )
    

    OR

    (df.groupby('SKU', as_index=False)
       .agg(count=('SKU','count'), 
            sum=('Str_OH_Units_Wkly','sum'))
       )
    
        SKU                                            count    sum
    0   1002075704-SMPL GRN OUTDR ODOR ELIMINATOR 32OZ    1      4
    1   1002772687-SG APC LAVENDER 128OZ                  2     13
    2   310408-SG APC SPY LEMON 32OZ                      1     12
    3   829706-SG HD BBQ AERO 20OZ                        1     11
    4   853534-SG PRO HEAVY DUTY 128OZ                    1     25
    5   854029-SG PRO3PLUS ANTIBAC&DISINFECT 128OZ        1      4