pythonpandas

add specific values in dataframe with groupby.agg


I am processing files that contain data in an unusable format. After processing one of the files I am left with a dataframe and a singular value.

The dataframe looks like this:

 df = pd.DataFrame({
        'A': ['foo', 'foo', 'foo', 'fizz', 'fizz', 'fizz', 'fizz'],
        'B': ['bar', 'bar', 'bar', 'buzz', 'buzz', 'buzz', 'baz'],
        'C': [10,10,10,10,10,10,10]
        })

 val = 20.0

The value is not supposed to be apart of my dataframe, but needs to be included in my TOTAL calculation. This is how I extract it from the file after I read it into a dataframe (it returns a string so I am casting to float):

if len(df.loc[df['ID'].eq("Settle"), 'C'].values) > 0:
    temp = df.loc[df['ID'].eq("Settle"), 'C']
    if temp.values[0].isnumeric():
        num = float(temp.values[0])
    else:
        num = 0.0
else:
    num = 0.0

now I need to do groupby.agg() with the following condition:

This is the basic code I would use to get the sums for each value in A, but I cannot figure out how factor in my condition. I'm assuming I can use np.where or lambda but unsure how to use that with .agg() and achieve my output.

out = df.groupby(['A'], sort=False, as_index=False).agg({"C":"sum"})

Expected Output:

A     C
foo   50
fizz  40

Solution

  • You could just append a row with the extra value (e.g. with concat) prior to performing the aggregation:

    out = (pd.concat([df, pd.DataFrame([{'A': 'foo', 'C': 20}])])
             .groupby('A', as_index=False, sort=False)['C'].sum()
          )
    

    Output:

          A   C
    0   foo  50
    1  fizz  40
    

    If you want to add different values to several groups:

    extra = [('foo', 20), ('fizz', 100)]
    
    out = (pd.concat([df, pd.DataFrame(extra, columns=['A', 'C'])])
             .groupby('A', as_index=False, sort=False)['C'].sum()
          )