pythonpython-3.xpandasdataframeconceptual

Efficient Subtotaling of columns in a pandas dataframe


I've been toying around with an idea for a program at work that automates our end of the month reports. Currently, it creates all the reports for us in Excel format and then we manually use Excel's subtotal feature to subtotal its columns and format the data into a table.

My idea is to subtotal each of the columns by customer, like so:

Patient Date Rx# Description Qty Price
EXAMPLE, JOHN 2/1/2021 357649 Aspirin 30 6.99
EXAMPLE, JOHN 2/1/2021 357650 Drug 30 13.99
EXAMPLE, JOHN 2/1/2021 357651 Tylenol 30 7.99
EXAMPLE, JOHN Subtotal 28.97
EXAMPLE, SUSAN 2/12/2021 357652 Expensive Drug 30 51.99
EXAMPLE, SUSAN 2/12/2021 357653 Drug 30 13.99
EXAMPLE, SUSAN 2/12/2021 357654 Tylenol 30 7.99
EXAMPLE, SUSAN Subtotal 73.97

With the exisiting dataframe looking like:

Patient Date Rx# Description Qty Price
EXAMPLE, JOHN 2/1/2021 357649 Aspirin 30 6.99
EXAMPLE, JOHN 2/1/2021 357650 Drug 30 13.99
EXAMPLE, JOHN 2/1/2021 357651 Tylenol 30 7.99
EXAMPLE, SUSAN 2/12/2021 357652 Expensive Drug 30 51.99
EXAMPLE, SUSAN 2/12/2021 357653 Drug 30 13.99
EXAMPLE, SUSAN 2/12/2021 357654 Tylenol 30 7.99

Is this possible with groupby()? It seems to have an option to group by rows rather than columns. The bigger issue I see is inserting into the existing dataframe, as it seems that pandas is more designed for manipulating/performing operations on large datasets rather than inserting/adding information.


Solution

  • # Calculate sums
    df_subtotal = df.groupby('Patient', as_index=False)[['Price']].agg('sum')
    # Manipulate string Patient
    df_subtotal['Patient'] = df_subtotal['Patient'] + ' subtotal'
    # Join dataframes
    df_new = pd.concat([df, df_subtotal], axis=0, ignore_index=True)
    # Sort
    df_new = df_new.sort_values(['Patient', 'Date'])