I've been toying around with an idea for a program at work that automates our end of the month reports. Currently, it creates all the reports for us in Excel format and then we manually use Excel's subtotal feature to subtotal its columns and format the data into a table.
My idea is to subtotal each of the columns by customer, like so:
Patient | Date | Rx# | Description | Qty | Price |
---|---|---|---|---|---|
EXAMPLE, JOHN | 2/1/2021 | 357649 | Aspirin | 30 | 6.99 |
EXAMPLE, JOHN | 2/1/2021 | 357650 | Drug | 30 | 13.99 |
EXAMPLE, JOHN | 2/1/2021 | 357651 | Tylenol | 30 | 7.99 |
EXAMPLE, JOHN Subtotal | 28.97 | ||||
EXAMPLE, SUSAN | 2/12/2021 | 357652 | Expensive Drug | 30 | 51.99 |
EXAMPLE, SUSAN | 2/12/2021 | 357653 | Drug | 30 | 13.99 |
EXAMPLE, SUSAN | 2/12/2021 | 357654 | Tylenol | 30 | 7.99 |
EXAMPLE, SUSAN Subtotal | 73.97 |
With the exisiting dataframe looking like:
Patient | Date | Rx# | Description | Qty | Price |
---|---|---|---|---|---|
EXAMPLE, JOHN | 2/1/2021 | 357649 | Aspirin | 30 | 6.99 |
EXAMPLE, JOHN | 2/1/2021 | 357650 | Drug | 30 | 13.99 |
EXAMPLE, JOHN | 2/1/2021 | 357651 | Tylenol | 30 | 7.99 |
EXAMPLE, SUSAN | 2/12/2021 | 357652 | Expensive Drug | 30 | 51.99 |
EXAMPLE, SUSAN | 2/12/2021 | 357653 | Drug | 30 | 13.99 |
EXAMPLE, SUSAN | 2/12/2021 | 357654 | Tylenol | 30 | 7.99 |
Is this possible with groupby()? It seems to have an option to group by rows rather than columns. The bigger issue I see is inserting into the existing dataframe, as it seems that pandas is more designed for manipulating/performing operations on large datasets rather than inserting/adding information.
# Calculate sums
df_subtotal = df.groupby('Patient', as_index=False)[['Price']].agg('sum')
# Manipulate string Patient
df_subtotal['Patient'] = df_subtotal['Patient'] + ' subtotal'
# Join dataframes
df_new = pd.concat([df, df_subtotal], axis=0, ignore_index=True)
# Sort
df_new = df_new.sort_values(['Patient', 'Date'])