pythonpandasdataframe

flattening pandas columns in a non-trivial way


I have a pandas dataframe which looks like the following:

    site    pay delta   over    under
phase   a   a   b           
ID                      
D01 London  12.3    10.3    -2.0    0.0 -2.0
D02 Bristol 7.3 13.2    5.9 5.9 0.0
D03 Bristol 17.3    19.2    1.9 1.9 0.0

I'd like to flatten the column multindex to the columns are

ID      site    a   b   delta   over    under                   
D01 London  12.3    10.3    -2.0    0.0 -2.0
D02 Bristol 7.3 13.2    5.9 5.9 0.0
D03 Bristol 17.3    19.2    1.9 1.9 0.0

I'm struggling with the online documentation and tutorials to work out how to do this.

I'd welcome advice, ideally to do this in a robust way which doesn't hardcode column positions.


UPDATE: the to_dict is

{'index': ['D01', 'D02', 'D03'],
 'columns': [('site', 'a'),
  ('pay', 'a'),
  ('pay', 'b'),
  ('delta', ''),
  ('over', ''),
  ('under', '')],
 'data': [['London', 12.3, 10.3, -2.0, 0.0, -2.0],
  ['Bristol', 7.3, 13.2, 5.8999999999999995, 5.8999999999999995, 0.0],
  ['Bristol', 17.3, 19.2, 1.8999999999999986, 1.8999999999999986, 0.0]],
 'index_names': ['ID'],
 'column_names': [None, 'phase']}

Solution

  • As the above answer mentioned the requirement is not clear, I have tried out just the flattening of the data, let us know

    import pandas as pd
    
    data = {
        'index': ['D01', 'D02', 'D03'],
        'columns': [('site', 'a'),
                    ('pay', 'a'),
                    ('pay', 'b'),
                    ('delta', ''),
                    ('over', ''),
                    ('under', '')],
        'data': [['London', 12.3, 10.3, -2.0, 0.0, -2.0],
                 ['Bristol', 7.3, 13.2, 5.9, 5.9, 0.0],
                 ['Bristol', 17.3, 19.2, 1.9, 1.9, 0.0]],
        'index_names': ['ID'],
        'column_names': [None, 'phase']
    }
    
    # Construct the MultiIndex columns and DataFrame
    multiindex_columns = pd.MultiIndex.from_tuples(data['columns'], names=data['column_names'])
    df = pd.DataFrame(data['data'], index=data['index'], columns=multiindex_columns)
    
    
    # Flattening the MultiIndex columns
    df.columns = [' '.join(filter(None, col)) for col in df.columns]
    
    print(df)
    

    Flattened DataFrame:

          site a  pay a  pay b  delta  over  under
    D01   London  12.30  10.30  -2.00  0.00  -2.00
    D02  Bristol   7.30  13.20   5.90  5.90   0.00
    D03  Bristol  17.30  19.20   1.90  1.90   0.00