I have a pandas dataframe which looks like the following:
site pay delta over under
phase a a b
ID
D01 London 12.3 10.3 -2.0 0.0 -2.0
D02 Bristol 7.3 13.2 5.9 5.9 0.0
D03 Bristol 17.3 19.2 1.9 1.9 0.0
I'd like to flatten the column multindex to the columns are
ID site a b delta over under
D01 London 12.3 10.3 -2.0 0.0 -2.0
D02 Bristol 7.3 13.2 5.9 5.9 0.0
D03 Bristol 17.3 19.2 1.9 1.9 0.0
I'm struggling with the online documentation and tutorials to work out how to do this.
I'd welcome advice, ideally to do this in a robust way which doesn't hardcode column positions.
UPDATE: the to_dict
is
{'index': ['D01', 'D02', 'D03'],
'columns': [('site', 'a'),
('pay', 'a'),
('pay', 'b'),
('delta', ''),
('over', ''),
('under', '')],
'data': [['London', 12.3, 10.3, -2.0, 0.0, -2.0],
['Bristol', 7.3, 13.2, 5.8999999999999995, 5.8999999999999995, 0.0],
['Bristol', 17.3, 19.2, 1.8999999999999986, 1.8999999999999986, 0.0]],
'index_names': ['ID'],
'column_names': [None, 'phase']}
As the above answer mentioned the requirement is not clear, I have tried out just the flattening of the data, let us know
import pandas as pd
data = {
'index': ['D01', 'D02', 'D03'],
'columns': [('site', 'a'),
('pay', 'a'),
('pay', 'b'),
('delta', ''),
('over', ''),
('under', '')],
'data': [['London', 12.3, 10.3, -2.0, 0.0, -2.0],
['Bristol', 7.3, 13.2, 5.9, 5.9, 0.0],
['Bristol', 17.3, 19.2, 1.9, 1.9, 0.0]],
'index_names': ['ID'],
'column_names': [None, 'phase']
}
# Construct the MultiIndex columns and DataFrame
multiindex_columns = pd.MultiIndex.from_tuples(data['columns'], names=data['column_names'])
df = pd.DataFrame(data['data'], index=data['index'], columns=multiindex_columns)
# Flattening the MultiIndex columns
df.columns = [' '.join(filter(None, col)) for col in df.columns]
print(df)
Flattened DataFrame:
site a pay a pay b delta over under
D01 London 12.30 10.30 -2.00 0.00 -2.00
D02 Bristol 7.30 13.20 5.90 5.90 0.00
D03 Bristol 17.30 19.20 1.90 1.90 0.00