pythonpandasexceldataframemulti-index

How to melt/stack multiindex headers with pandas?


I have an Excel file containing credit spread calculations for different credit ratings. The structure includes merged cells which create multi-level column headers. Here is an example of the structure:

USD All Sectors A  USD All Sectors A  USD All Sectors A
Level  Date        1D                7D                14D
0      05/30/2023  66.42             66.42             66.42
0      05/31/2023  65.95             65.95             65.95

I need to transform this data into a long format using pandas so that it looks like this:

DATO        INDEKS            TENOR  NIVEAU
05/30/2023  USD All Sectors A  1D     66.42
05/31/2023  USD All Sectors A  1D     65.95
05/30/2023  USD All Sectors A  7D     66.42
05/31/2023  USD All Sectors A  7D     65.95
05/30/2023  USD All Sectors A  14D    66.42
05/31/2023  USD All Sectors A  14D    65.95
import pandas as pd

Load the data from the Excel file
data = pd.read_excel('credit_spreads.xlsx', header=[0, 1], skiprows=12)

df_stacked = df.stack(level=1).reset_index()
print(df_stacked.head(10))

Which gives the following output (a small part of the output):

  level_0 level_1  USD All Sectors A  USD All Sectors B  USD All Sectors BB  \
0        0    108M           129.6630           585.9322            335.4587   
1        0    120M           133.1738           585.9688            335.4797   
2        0     12M            66.4248           581.6899            333.0146   

However, I am having trouble getting the multi-index headers to work correctly, and the reshaped data is not in the desired format. The main issue seems to be handling the correct multiindex headers.

Is the stack function the correct function to use here? Or should i go with melt or pivot instead?

example of excel input attached.Excel Input


Solution

  • If I understand correctly, you should set the first two columns as index while reading your file:

    df = (pd.read_excel('credit_spreads.xlsx', skiprows=12,
                        header=[0, 1], index_col=[0, 1])
            .droplevel(0)
            .rename_axis(index='DATO', columns=['INDEKS', 'TENOR'])
            .stack([0, 1]).reset_index(name='NIVEAU')
         )
    

    Output:

            DATO             INDEKS TENOR  NIVEAU
    0 2023-05-30  USD All Sectors A    1D   66.42
    1 2023-05-30  USD All Sectors A    7D   66.42
    2 2023-05-30  USD All Sectors A   14D   66.42
    3 2023-05-31  USD All Sectors A    1D   65.95
    4 2023-05-31  USD All Sectors A    7D   65.95
    5 2023-05-31  USD All Sectors A   14D   65.95
    

    Alternatively, with melt:

    df = (pd.read_excel('credit_spreads.xlsx', header=[0, 1], skiprows=12)
            .melt([('Unnamed: 0_level_0', 'Level'),
                   ('Unnamed: 1_level_0', 'Date')], value_name='NIVEAU')
            .drop(columns=[('Unnamed: 0_level_0', 'Level')])
            .rename(columns={'variable_0': 'INDEKS', 'variable_1': 'TENOR',
                             ('Unnamed: 1_level_0', 'Date'): 'DATO'})
         )
    

    Output:

            DATO             INDEKS TENOR  NIVEAU
    0 2023-05-30  USD All Sectors A    1D   66.42
    1 2023-05-31  USD All Sectors A    1D   65.95
    2 2023-05-30  USD All Sectors A    7D   66.42
    3 2023-05-31  USD All Sectors A    7D   65.95
    4 2023-05-30  USD All Sectors A   14D   66.42
    5 2023-05-31  USD All Sectors A   14D   65.95