Edit: 2022NOV21
How do we chain df.col.str.split()
since this returns the split columns if expand = True
I am trying to split a column after performing .melt()
. If I use assign I end up using the original column and the melted column actually does not even exist.
df = pd.DataFrame().from_dict({
'id' : [1,2,3,4],
'2022_amt' : [10.1,20.2,30.3, 40.4],
'2022_qty' : [10,20,30,40]
})
df = (
df
.melt(
id_vars=['id'],
value_vars=['2022_amt', '2022_qty'],
var_name='fy',
value_name='num'
)
# can i chain any pd.Series.str.[METHOD] here
# .assign(
# year=df.fy.str.split('_', expand=True)[0],
# t=df.fy.str.split('_', expand=True)[1]
# )
)
# i can add the two columns in this way but can we use chain to expand dataframe df
df[['year', 't']] = df.fy.str.split('_', expand=True)
df = df.drop(columns = ['fy'])
Using expand
converts it into a DataFrame, which you do not really want here; secondly with chaining, use an anonymous function to refer to the previous dataframe:
(df
.melt(id_vars='id',var_name='fy',value_name='num')
assign(year = lambda df: df.fy.str.split('_').str[0],
t = lambda df: df.fy.str.split('_').str[1])
)
id fy num year t
0 1 2022_amt 10.1 2022 amt
1 2 2022_amt 20.2 2022 amt
2 3 2022_amt 30.3 2022 amt
3 4 2022_amt 40.4 2022 amt
4 1 2022_qty 10.0 2022 qty
5 2 2022_qty 20.0 2022 qty
6 3 2022_qty 30.0 2022 qty
7 4 2022_qty 40.0 2022 qty
For your use case, there are simpler, more efficient ways to do this:
pd.stack
:df = df.set_index('id')
df.columns = df.columns.str.split('_', expand = True)
df.columns.names = ['year', 't']
df.stack(['year', 't']).reset_index(name='num')
id year t num
0 1 2022 amt 10.1
1 1 2022 qty 10.0
2 2 2022 amt 20.2
3 2 2022 qty 20.0
4 3 2022 amt 30.3
5 3 2022 qty 30.0
6 4 2022 amt 40.4
7 4 2022 qty 40.0
pivot_longer
from pyjanitor
:# pip install pyjanitor
import pandas as pd
import janitor as jn
df.pivot_longer(index = 'id', names_to = ('year','t'), names_sep = '_')
id year t value
0 1 2022 amt 10.1
1 2 2022 amt 20.2
2 3 2022 amt 30.3
3 4 2022 amt 40.4
4 1 2022 qty 10.0
5 2 2022 qty 20.0
6 3 2022 qty 30.0
7 4 2022 qty 40.0