pandaspivotpandas-melt

Is it even possible to reshape this using Pandas?


I have this dataframe (with all 50 states and more categories, but if we can get this to work, I can apply it to the whole dataset):

                   US     US_bp   US_bp%       AL   AL_bp  AL_bp%
total_pop   324173084  41393176     12.8  4920613  794326    16.1
white       198511109  19510415      9.8  3218517  378269    11.8
black        38526055   8402643     21.8  1260356  327284    26.0

And need this:

    place  total_pop        bp    bp%  white_pop  white_bp  white_bp%  black_pop  black_bp  black_bp%
0      US  324173084  41393176   12.8  198511109  19510415        9.8    8402643    840263       21.4
1      AL    4920613    794326   16.1    3218517    378269       11.8    1260356    327284       26.0

How can I do this with Pandas? I tried melt and pivot, but can't wrap my head around what might work.


Solution

  • Here's a partial solution: Convert the columns into a MultiIndex, then stack and unstack.

    I'm using a regex for the first step: Extract two uppercase characters optionally followed by an underscore plus the other part.

    col_pairs = df.columns.str.extract(r'^([A-Z]{2})(?:_(.*))?$').fillna('pop')
    df.columns = pd.MultiIndex.from_arrays(col_pairs.T.values)
    
    (df
        .stack(level=0)
        .unstack(level=0)
        .swaplevel(axis=1)
        # Fix order
        .reindex(df.index, axis=1, level=0)
        .reindex(df.columns.get_level_values(1).unique(), axis=1, level=1)
        .reindex(df.columns.get_level_values(0).unique())
        )
    
       total_pop                      white                     black               
             pop        bp   bp%        pop        bp   bp%       pop       bp   bp%
    US 324173084  41393176  12.8  198511109  19510415   9.8  38526055  8402643  21.8
    AL   4920613    794326  16.1    3218517    378269  11.8   1260356   327284  26.0
    

    I'd prefer if stacking/unstacking didn't sort, but at least you can fix that by reindexing. I'm also not bothering to get the labels exactly as you want them, but most of that's easy.

    Due credit to sammywemmy for the stack/unstack technique.