pythonpandasdataframelookupmelt

How to reshape a pandas dataframe based on column groups


I need to use some names of the columns as part of the df. While keeping the first 3 columns identical, I need to create some other columns based on the content of the row.

Here I have some transactions from some customers:

cust_id    cust_first   cust_last   au_zo   au_zo_pay   fi_gu   fi_gu_pay   wa     wa_pay
0   1000    Andrew       Jones      50.85   debit        NaN      NaN      69.12    debit
1   1001    Fatima       Lee        NaN      NaN        18.16     debit    NaN       NaN
2   1002    Sophia       Lewis      NaN      NaN         NaN      NaN     159.54.  credit
3   1003    Edward      Bush       45.29    credit     59.63    credit     NaN       NaN
4   1004    Mark        Nunez      20.87    credit     20.87    credit    86.18     debit

First, I need to add a new column, 'city'. Since it is not on the database. It is defaulted to be 'New York'. (that's easy!)

But here is where I am getting stuck: Add a new column 'store' holds values according to where a transaction took place. au_zo --> autozone, fi_gu --> five guys, wa --> walmart

Add new column 'classification' according to the store previously added: auto zone --> auto-repair, five guys --> food, walmart --> groceries

Column 'amount' holds the value of the customer and store.

Column 'transaction_type' is the value of au_zo_pay, fi_gu_pay, wa_pay respectively.

So at the end it looks like this:

  cust_id  city cust_first  cust_last   store   classification  amount  trans_type
0   1000  New York  Andrew  Jones     auto zone auto-repair      50.85  debit
1   1000  New York  Andrew  Jones     walmart   groceries        69.12  debit
2   1001  New York  Fatima  Lee       five guys food             18.16  debit
3   1002  New York  Sophia  Solis     walmart   groceries        159.54 credit
4   1003  New York  Edward  Bush      auto zone auto-repair      45.29  credit
5   1003  New York  Edward  Bush      five guys food             59.63  credit
6   1004  New York  Mark    Nunez     auto zone auto-repair      20.87  credit
7   1004  New York  Mark    Nunez     five guys food             20.87  credit
8   1004  New York  Mark    Nunez     walmart   groceries        86.18  debit

I have tried using df.melt() but I don't get the results.


Solution

  • We can get the desired output in 3 steps:

    1. Make the metadata columns the index and split the remaining columns into amount and pay columns (set_index and MultiIndex assignment)
    2. Reshape the dataframe to make it long (stack + reset_index)
    3. Add classification, store and city columns and reorder columns (assign + reindex)
    df1 = df.set_index(['cust_id', 'cust_first', 'cust_last'])
    df1.columns = pd.MultiIndex.from_tuples([c.rsplit('_', 1) if c.endswith('pay') else [c, 'amount'] for c in df1.columns], names=['store', ''])
    df1 = df1.stack(level='store').reset_index()
    df1 = (
        df1
        .assign(
            classification=df1['store'].map({'au_zo': 'auto-repair', 'fi_gu': 'food', 'wa': 'groceries'}),
            store=df1['store'].map({'au_zo': 'autozone', 'fi_gu': 'five guys', 'wa': 'walmart'}),
            city='New York',
        )
        .reindex(columns=['cust_id', 'city', 'cust_first', 'cust_last', 'store', 'classification', 'amount', 'pay'])
    )
    

    enter image description here