[SOLVED] Efficient Subtotaling of columns in a pandas dataframe

Efficient Subtotaling of columns in a pandas dataframe

I've been toying around with an idea for a program at work that automates our end of the month reports. Currently, it creates all the reports for us in Excel format and then we manually use Excel's subtotal feature to subtotal its columns and format the data into a table.

My idea is to subtotal each of the columns by customer, like so:

Patient	Date	Rx#	Description	Qty	Price
EXAMPLE, JOHN	2/1/2021	357649	Aspirin	30	6.99
EXAMPLE, JOHN	2/1/2021	357650	Drug	30	13.99
EXAMPLE, JOHN	2/1/2021	357651	Tylenol	30	7.99
EXAMPLE, JOHN Subtotal					28.97
EXAMPLE, SUSAN	2/12/2021	357652	Expensive Drug	30	51.99
EXAMPLE, SUSAN	2/12/2021	357653	Drug	30	13.99
EXAMPLE, SUSAN	2/12/2021	357654	Tylenol	30	7.99
EXAMPLE, SUSAN Subtotal					73.97

With the exisiting dataframe looking like:

Patient	Date	Rx#	Description	Qty	Price
EXAMPLE, JOHN	2/1/2021	357649	Aspirin	30	6.99
EXAMPLE, JOHN	2/1/2021	357650	Drug	30	13.99
EXAMPLE, JOHN	2/1/2021	357651	Tylenol	30	7.99
EXAMPLE, SUSAN	2/12/2021	357652	Expensive Drug	30	51.99
EXAMPLE, SUSAN	2/12/2021	357653	Drug	30	13.99
EXAMPLE, SUSAN	2/12/2021	357654	Tylenol	30	7.99

Is this possible with groupby()? It seems to have an option to group by rows rather than columns. The bigger issue I see is inserting into the existing dataframe, as it seems that pandas is more designed for manipulating/performing operations on large datasets rather than inserting/adding information.

Solution

# Calculate sums
df_subtotal = df.groupby('Patient', as_index=False)[['Price']].agg('sum')
# Manipulate string Patient
df_subtotal['Patient'] = df_subtotal['Patient'] + ' subtotal'
# Join dataframes
df_new = pd.concat([df, df_subtotal], axis=0, ignore_index=True)
# Sort
df_new = df_new.sort_values(['Patient', 'Date'])