[SOLVED] Multi-Level Header in Pandas DataFrame

Multi-Level Header in Pandas DataFrame

I have the following table in a csv file:

wi_document_id	wir_rejected_by	wir_reason	wir_sys_created_on
Int0002277	Agent_1	Time out	3/8/2024 11:18:10 AM
Int0002278	Agent_1	Time out	2/26/2024 12:18:16 AM
Int0002279	Agent_2	Busy	3/11/2024 09:18:31 AM
Int0002280	Agent_2	Time out	3/18/2024 10:45:08 AM
Int0002281	Agent_2	Time out	3/4/2024 10:18:22 AM
Int0002282	Agent_3	Time out	3/18/2024 11:20:51 AM
Int0002283	Agent_3	Busy	2/29/2024 08:13:04 AM
Int0002284	Agent_4	Time out	3/4/2024 09:30:45 AM
Int0002285	Agent_4	Busy	3/12/2024 10:18:34 AM

And I have the below script to calculate:

The 'rejection count' by each agent on weekly basis.
The 'rejection count' with reason = 'Time out' each agent on weekly basis.
The 'rejection count' with reason = 'Busy' each agent on weekly basis.

Script:

import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv('Rejection Report.csv')

# Convert 'wir_sys_created_on' column to datetime
df['wir_sys_created_on'] = pd.to_datetime(df['wir_sys_created_on'])

# Extract week numbers from the datetime column starting from 1 and format with ISO week number and the date of the Monday
df['week_number'] = df['wir_sys_created_on'] - pd.to_timedelta(df['wir_sys_created_on'].dt.dayofweek, unit='d')
df['week_number'] = 'Week ' + df['week_number'].dt.strftime('%V') + ' (' + df['week_number'].dt.strftime('%Y-%m-%d') + ')'

# Group by agent, week number, and rejection reason
grouped = df.groupby(['wir_rejected_by', 'week_number', 'wir_reason'])

# Calculate rejection count by reason per week
rejection_by_reason = grouped.size().unstack(fill_value=0)

# Calculate total rejection count per week
weekly_rejection_count = df.groupby(['wir_rejected_by', 'week_number']).size().unstack(fill_value=0)

# Filter rejection counts based on reasons 'Time out' and 'Busy'
rejection_timeout = rejection_by_reason['Time out'].unstack(fill_value=0)
rejection_busy = rejection_by_reason['Busy'].unstack(fill_value=0)

# Concatenate DataFrames with a multi-level column index
df_with_multiindex = pd.concat(
    [weekly_rejection_count, rejection_timeout, rejection_busy],
    axis=1,
    keys=['Total Rejections', 'Rejections due to Time out', 'Rejections due to Busy'],
    names=['', '']
)

# Ensure weeks are ordered chronologically
df_with_multiindex = df_with_multiindex.reindex(sorted(df_with_multiindex.columns), axis=1)

# Apply some formatting
styled_df = df_with_multiindex.style.format("{:.0f}")
styled_df = styled_df.set_table_styles([
    {'selector': 'th', 'props': [('text-align', 'center')]},
    {'selector': 'td', 'props': [('text-align', 'center')]},
    {'selector': 'caption', 'props': [('caption-side', 'bottom')]}
])

# Set the caption
styled_df = styled_df.set_caption('Rejections Report')

# Display the styled DataFrame
styled_df.set_properties(**{'border-collapse': 'collapse', 'border': '1px solid black'})

The calculation part is good, but the multiple level column headers are set incorrectly:

The rejection reasons and total rejection headers are on top of the week numbers which resulted in the week numbers being repeated.

I need the table headers to look like this and have columns and cells borders:

The week numbers should be on the top level header and nested below it the calculated columns without having the week numbers repeated for each calculated column.

Any tips on how to accomplish the desired structure?

Solution

The calculation part is good, but the multiple level column headers are set incorrectly..

I would proceed this way to do the styling part :

# to be adjusted
TCOLOR, BGCOLOR = "black", "lightcyan"

CSS = [
    {
        "selector": "td, th[class^='col'], "
        "th[class^='row'], .index_name.level1",
        "props": [
            ("text-align", "center"), ("width", "100px"),
            ("color", TCOLOR), ("background-color", BGCOLOR),
            ("border", "1px solid black"),
        ],
    },
    {"selector": "caption", "props": [("caption-side", "bottom")]},
]

df_styled = (
    df_with_multiindex.rename_axis(
        index=None, columns=("wir_rejected_by", None)
    ).swaplevel(axis=1).sort_index(axis=1, level=0)
    .style.set_caption("Rejections Report")
    .set_table_styles(CSS)
)

Output (in Notebook) :