I need to use some names of the columns as part of the df. While keeping the first 3 columns identical, I need to create some other columns based on the content of the row.
Here I have some transactions from some customers:
cust_id cust_first cust_last au_zo au_zo_pay fi_gu fi_gu_pay wa wa_pay
0 1000 Andrew Jones 50.85 debit NaN NaN 69.12 debit
1 1001 Fatima Lee NaN NaN 18.16 debit NaN NaN
2 1002 Sophia Lewis NaN NaN NaN NaN 159.54. credit
3 1003 Edward Bush 45.29 credit 59.63 credit NaN NaN
4 1004 Mark Nunez 20.87 credit 20.87 credit 86.18 debit
First, I need to add a new column, 'city'. Since it is not on the database. It is defaulted to be 'New York'. (that's easy!)
But here is where I am getting stuck:
Add a new column 'store' holds values according to where a transaction took place. au_zo --> autozone, fi_gu --> five guys, wa --> walmart
Add new column 'classification' according to the store previously added: auto zone --> auto-repair, five guys --> food, walmart --> groceries
Column 'amount' holds the value of the customer and store.
Column 'transaction_type' is the value of au_zo_pay, fi_gu_pay, wa_pay respectively.
So at the end it looks like this:
cust_id city cust_first cust_last store classification amount trans_type
0 1000 New York Andrew Jones auto zone auto-repair 50.85 debit
1 1000 New York Andrew Jones walmart groceries 69.12 debit
2 1001 New York Fatima Lee five guys food 18.16 debit
3 1002 New York Sophia Solis walmart groceries 159.54 credit
4 1003 New York Edward Bush auto zone auto-repair 45.29 credit
5 1003 New York Edward Bush five guys food 59.63 credit
6 1004 New York Mark Nunez auto zone auto-repair 20.87 credit
7 1004 New York Mark Nunez five guys food 20.87 credit
8 1004 New York Mark Nunez walmart groceries 86.18 debit
I have tried using df.melt()
but I don't get the results.
We can get the desired output in 3 steps:
amount
and pay
columns (set_index
and MultiIndex
assignment)stack
+ reset_index
)classification
, store
and city
columns and reorder columns (assign
+ reindex
)df1 = df.set_index(['cust_id', 'cust_first', 'cust_last'])
df1.columns = pd.MultiIndex.from_tuples([c.rsplit('_', 1) if c.endswith('pay') else [c, 'amount'] for c in df1.columns], names=['store', ''])
df1 = df1.stack(level='store').reset_index()
df1 = (
df1
.assign(
classification=df1['store'].map({'au_zo': 'auto-repair', 'fi_gu': 'food', 'wa': 'groceries'}),
store=df1['store'].map({'au_zo': 'autozone', 'fi_gu': 'five guys', 'wa': 'walmart'}),
city='New York',
)
.reindex(columns=['cust_id', 'city', 'cust_first', 'cust_last', 'store', 'classification', 'amount', 'pay'])
)