I have the following table in a csv file:
wi_document_id | wir_rejected_by | wir_reason | wir_sys_created_on |
---|---|---|---|
Int0002277 | Agent_1 | Time out | 3/8/2024 11:18:10 AM |
Int0002278 | Agent_1 | Time out | 2/26/2024 12:18:16 AM |
Int0002279 | Agent_2 | Busy | 3/11/2024 09:18:31 AM |
Int0002280 | Agent_2 | Time out | 3/18/2024 10:45:08 AM |
Int0002281 | Agent_2 | Time out | 3/4/2024 10:18:22 AM |
Int0002282 | Agent_3 | Time out | 3/18/2024 11:20:51 AM |
Int0002283 | Agent_3 | Busy | 2/29/2024 08:13:04 AM |
Int0002284 | Agent_4 | Time out | 3/4/2024 09:30:45 AM |
Int0002285 | Agent_4 | Busy | 3/12/2024 10:18:34 AM |
And I have the below script to calculate:
Script:
import pandas as pd
# Load the CSV file into a DataFrame
df = pd.read_csv('Rejection Report.csv')
# Convert 'wir_sys_created_on' column to datetime
df['wir_sys_created_on'] = pd.to_datetime(df['wir_sys_created_on'])
# Extract week numbers from the datetime column starting from 1 and format with ISO week number and the date of the Monday
df['week_number'] = df['wir_sys_created_on'] - pd.to_timedelta(df['wir_sys_created_on'].dt.dayofweek, unit='d')
df['week_number'] = 'Week ' + df['week_number'].dt.strftime('%V') + ' (' + df['week_number'].dt.strftime('%Y-%m-%d') + ')'
# Group by agent, week number, and rejection reason
grouped = df.groupby(['wir_rejected_by', 'week_number', 'wir_reason'])
# Calculate rejection count by reason per week
rejection_by_reason = grouped.size().unstack(fill_value=0)
# Calculate total rejection count per week
weekly_rejection_count = df.groupby(['wir_rejected_by', 'week_number']).size().unstack(fill_value=0)
# Filter rejection counts based on reasons 'Time out' and 'Busy'
rejection_timeout = rejection_by_reason['Time out'].unstack(fill_value=0)
rejection_busy = rejection_by_reason['Busy'].unstack(fill_value=0)
# Concatenate DataFrames with a multi-level column index
df_with_multiindex = pd.concat(
[weekly_rejection_count, rejection_timeout, rejection_busy],
axis=1,
keys=['Total Rejections', 'Rejections due to Time out', 'Rejections due to Busy'],
names=['', '']
)
# Ensure weeks are ordered chronologically
df_with_multiindex = df_with_multiindex.reindex(sorted(df_with_multiindex.columns), axis=1)
# Apply some formatting
styled_df = df_with_multiindex.style.format("{:.0f}")
styled_df = styled_df.set_table_styles([
{'selector': 'th', 'props': [('text-align', 'center')]},
{'selector': 'td', 'props': [('text-align', 'center')]},
{'selector': 'caption', 'props': [('caption-side', 'bottom')]}
])
# Set the caption
styled_df = styled_df.set_caption('Rejections Report')
# Display the styled DataFrame
styled_df.set_properties(**{'border-collapse': 'collapse', 'border': '1px solid black'})
The calculation part is good, but the multiple level column headers are set incorrectly:
The rejection reasons and total rejection headers are on top of the week numbers which resulted in the week numbers being repeated.
I need the table headers to look like this and have columns and cells borders:
The week numbers should be on the top level header and nested below it the calculated columns without having the week numbers repeated for each calculated column.
Any tips on how to accomplish the desired structure?
The calculation part is good, but the multiple level column headers are set incorrectly..
I would proceed this way to do the styling part :
# to be adjusted
TCOLOR, BGCOLOR = "black", "lightcyan"
CSS = [
{
"selector": "td, th[class^='col'], "
"th[class^='row'], .index_name.level1",
"props": [
("text-align", "center"), ("width", "100px"),
("color", TCOLOR), ("background-color", BGCOLOR),
("border", "1px solid black"),
],
},
{"selector": "caption", "props": [("caption-side", "bottom")]},
]
df_styled = (
df_with_multiindex.rename_axis(
index=None, columns=("wir_rejected_by", None)
).swaplevel(axis=1).sort_index(axis=1, level=0)
.style.set_caption("Rejections Report")
.set_table_styles(CSS)
)
Output (in Notebook) :