I have an Excel file containing credit spread calculations for different credit ratings. The structure includes merged cells which create multi-level column headers. Here is an example of the structure:
USD All Sectors A USD All Sectors A USD All Sectors A
Level Date 1D 7D 14D
0 05/30/2023 66.42 66.42 66.42
0 05/31/2023 65.95 65.95 65.95
I need to transform this data into a long format using pandas so that it looks like this:
DATO INDEKS TENOR NIVEAU
05/30/2023 USD All Sectors A 1D 66.42
05/31/2023 USD All Sectors A 1D 65.95
05/30/2023 USD All Sectors A 7D 66.42
05/31/2023 USD All Sectors A 7D 65.95
05/30/2023 USD All Sectors A 14D 66.42
05/31/2023 USD All Sectors A 14D 65.95
import pandas as pd
Load the data from the Excel file
data = pd.read_excel('credit_spreads.xlsx', header=[0, 1], skiprows=12)
df_stacked = df.stack(level=1).reset_index()
print(df_stacked.head(10))
Which gives the following output (a small part of the output):
level_0 level_1 USD All Sectors A USD All Sectors B USD All Sectors BB \
0 0 108M 129.6630 585.9322 335.4587
1 0 120M 133.1738 585.9688 335.4797
2 0 12M 66.4248 581.6899 333.0146
However, I am having trouble getting the multi-index headers to work correctly, and the reshaped data is not in the desired format. The main issue seems to be handling the correct multiindex headers.
Is the stack function the correct function to use here? Or should i go with melt or pivot instead?
If I understand correctly, you should set the first two columns as index while reading your file:
df = (pd.read_excel('credit_spreads.xlsx', skiprows=12,
header=[0, 1], index_col=[0, 1])
.droplevel(0)
.rename_axis(index='DATO', columns=['INDEKS', 'TENOR'])
.stack([0, 1]).reset_index(name='NIVEAU')
)
Output:
DATO INDEKS TENOR NIVEAU
0 2023-05-30 USD All Sectors A 1D 66.42
1 2023-05-30 USD All Sectors A 7D 66.42
2 2023-05-30 USD All Sectors A 14D 66.42
3 2023-05-31 USD All Sectors A 1D 65.95
4 2023-05-31 USD All Sectors A 7D 65.95
5 2023-05-31 USD All Sectors A 14D 65.95
Alternatively, with melt
:
df = (pd.read_excel('credit_spreads.xlsx', header=[0, 1], skiprows=12)
.melt([('Unnamed: 0_level_0', 'Level'),
('Unnamed: 1_level_0', 'Date')], value_name='NIVEAU')
.drop(columns=[('Unnamed: 0_level_0', 'Level')])
.rename(columns={'variable_0': 'INDEKS', 'variable_1': 'TENOR',
('Unnamed: 1_level_0', 'Date'): 'DATO'})
)
Output:
DATO INDEKS TENOR NIVEAU
0 2023-05-30 USD All Sectors A 1D 66.42
1 2023-05-31 USD All Sectors A 1D 65.95
2 2023-05-30 USD All Sectors A 7D 66.42
3 2023-05-31 USD All Sectors A 7D 65.95
4 2023-05-30 USD All Sectors A 14D 66.42
5 2023-05-31 USD All Sectors A 14D 65.95