Suppose I have the df
below. I would like to combine the price columns and value columns so that all prices are in one column and all volumes are in another column. I would also like a third column that identified the price level. For example, unit1
, unit2
and unit3
.
import numpy as np
import pandas as pd
df = pd.DataFrame(
{
'uid': ['U100', 'U200', 'E100', 'E200', 'E300', 'A100', 'A200', 'A300', 'A400', 'A500'],
'location': ['US', 'US', 'EU', 'EU', 'EU', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia'],
'unit1_price': [10, 20, 15, 10, 10, 10, 20, 20, 25, 25],
'unit1_vol': [100, 150, 100, 200, 150, 150, 100, 200, 200, 200],
'unit2_price': [10, 25, 30, 20, 10, 10, 10, 10, 20, 20],
'unit2_vol': [200, 200, 150, 300, 300, 200, 150, 225, 225, 250],
'unit3_price': [0, 0, 0, 20, 20, 20, 20, 20, 20, 20],
'unit3_vol': [0, 0, 0, 500, 500, 500, 500, 500, 500, 500]
}
)
df
This is what the final df
should look like:
I tried using melt
and I think almost have the right answer.
pd.melt(df, id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit1_vol', 'unit2_price', 'unit2_vol', 'unit3_price', 'unit3_vol'])
This is what the partial df looks like with melt
:
The problem with the above is that volume and price are in the same column but I want them to be in 2 separate columns.
Did I use the right function?
You can form two dataframe using pd.melt
first and combine it back to become one dataframe.
df1 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit2_price', 'unit3_price'],var_name='unit',value_name='price')
df2 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_vol', 'unit2_vol', 'unit3_vol'],var_name='unit', value_name="volume")
ddf = pd.concat([df1,df2['volume']],axis=1).sort_values(by=['uid','unit'],ignore_index=True)
ddf['unit']=ddf['unit'].str.split('_',expand=True)[0]