I have this dataframe (with all 50 states and more categories, but if we can get this to work, I can apply it to the whole dataset):
US US_bp US_bp% AL AL_bp AL_bp%
total_pop 324173084 41393176 12.8 4920613 794326 16.1
white 198511109 19510415 9.8 3218517 378269 11.8
black 38526055 8402643 21.8 1260356 327284 26.0
And need this:
place total_pop bp bp% white_pop white_bp white_bp% black_pop black_bp black_bp%
0 US 324173084 41393176 12.8 198511109 19510415 9.8 8402643 840263 21.4
1 AL 4920613 794326 16.1 3218517 378269 11.8 1260356 327284 26.0
How can I do this with Pandas? I tried melt and pivot, but can't wrap my head around what might work.
Here's a partial solution: Convert the columns into a MultiIndex, then stack and unstack.
I'm using a regex for the first step: Extract two uppercase characters optionally followed by an underscore plus the other part.
col_pairs = df.columns.str.extract(r'^([A-Z]{2})(?:_(.*))?$').fillna('pop')
df.columns = pd.MultiIndex.from_arrays(col_pairs.T.values)
(df
.stack(level=0)
.unstack(level=0)
.swaplevel(axis=1)
# Fix order
.reindex(df.index, axis=1, level=0)
.reindex(df.columns.get_level_values(1).unique(), axis=1, level=1)
.reindex(df.columns.get_level_values(0).unique())
)
total_pop white black
pop bp bp% pop bp bp% pop bp bp%
US 324173084 41393176 12.8 198511109 19510415 9.8 38526055 8402643 21.8
AL 4920613 794326 16.1 3218517 378269 11.8 1260356 327284 26.0
I'd prefer if stacking/unstacking didn't sort, but at least you can fix that by reindexing. I'm also not bothering to get the labels exactly as you want them, but most of that's easy.
Due credit to sammywemmy for the stack/unstack technique.