pandasheadermulti-level

Multi-Level Header in Pandas DataFrame


I have the following table in a csv file:

wi_document_id wir_rejected_by wir_reason wir_sys_created_on
Int0002277 Agent_1 Time out 3/8/2024 11:18:10 AM
Int0002278 Agent_1 Time out 2/26/2024 12:18:16 AM
Int0002279 Agent_2 Busy 3/11/2024 09:18:31 AM
Int0002280 Agent_2 Time out 3/18/2024 10:45:08 AM
Int0002281 Agent_2 Time out 3/4/2024 10:18:22 AM
Int0002282 Agent_3 Time out 3/18/2024 11:20:51 AM
Int0002283 Agent_3 Busy 2/29/2024 08:13:04 AM
Int0002284 Agent_4 Time out 3/4/2024 09:30:45 AM
Int0002285 Agent_4 Busy 3/12/2024 10:18:34 AM

And I have the below script to calculate:

  1. The 'rejection count' by each agent on weekly basis.
  2. The 'rejection count' with reason = 'Time out' each agent on weekly basis.
  3. The 'rejection count' with reason = 'Busy' each agent on weekly basis.

Script:

import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv('Rejection Report.csv')

# Convert 'wir_sys_created_on' column to datetime
df['wir_sys_created_on'] = pd.to_datetime(df['wir_sys_created_on'])

# Extract week numbers from the datetime column starting from 1 and format with ISO week number and the date of the Monday
df['week_number'] = df['wir_sys_created_on'] - pd.to_timedelta(df['wir_sys_created_on'].dt.dayofweek, unit='d')
df['week_number'] = 'Week ' + df['week_number'].dt.strftime('%V') + ' (' + df['week_number'].dt.strftime('%Y-%m-%d') + ')'

# Group by agent, week number, and rejection reason
grouped = df.groupby(['wir_rejected_by', 'week_number', 'wir_reason'])

# Calculate rejection count by reason per week
rejection_by_reason = grouped.size().unstack(fill_value=0)

# Calculate total rejection count per week
weekly_rejection_count = df.groupby(['wir_rejected_by', 'week_number']).size().unstack(fill_value=0)

# Filter rejection counts based on reasons 'Time out' and 'Busy'
rejection_timeout = rejection_by_reason['Time out'].unstack(fill_value=0)
rejection_busy = rejection_by_reason['Busy'].unstack(fill_value=0)

# Concatenate DataFrames with a multi-level column index
df_with_multiindex = pd.concat(
    [weekly_rejection_count, rejection_timeout, rejection_busy],
    axis=1,
    keys=['Total Rejections', 'Rejections due to Time out', 'Rejections due to Busy'],
    names=['', '']
)

# Ensure weeks are ordered chronologically
df_with_multiindex = df_with_multiindex.reindex(sorted(df_with_multiindex.columns), axis=1)

# Apply some formatting
styled_df = df_with_multiindex.style.format("{:.0f}")
styled_df = styled_df.set_table_styles([
    {'selector': 'th', 'props': [('text-align', 'center')]},
    {'selector': 'td', 'props': [('text-align', 'center')]},
    {'selector': 'caption', 'props': [('caption-side', 'bottom')]}
])

# Set the caption
styled_df = styled_df.set_caption('Rejections Report')

# Display the styled DataFrame
styled_df.set_properties(**{'border-collapse': 'collapse', 'border': '1px solid black'})

The calculation part is good, but the multiple level column headers are set incorrectly:

enter image description here

The rejection reasons and total rejection headers are on top of the week numbers which resulted in the week numbers being repeated.

I need the table headers to look like this and have columns and cells borders:

enter image description here

The week numbers should be on the top level header and nested below it the calculated columns without having the week numbers repeated for each calculated column.

Any tips on how to accomplish the desired structure?


Solution

  • The calculation part is good, but the multiple level column headers are set incorrectly..

    I would proceed this way to do the styling part :

    # to be adjusted
    TCOLOR, BGCOLOR = "black", "lightcyan"
    
    CSS = [
        {
            "selector": "td, th[class^='col'], "
            "th[class^='row'], .index_name.level1",
            "props": [
                ("text-align", "center"), ("width", "100px"),
                ("color", TCOLOR), ("background-color", BGCOLOR),
                ("border", "1px solid black"),
            ],
        },
        {"selector": "caption", "props": [("caption-side", "bottom")]},
    ]
    
    df_styled = (
        df_with_multiindex.rename_axis(
            index=None, columns=("wir_rejected_by", None)
        ).swaplevel(axis=1).sort_index(axis=1, level=0)
        .style.set_caption("Rejections Report")
        .set_table_styles(CSS)
    )
    

    Output (in Notebook) :

    enter image description here