Restricting to pandas method chaining, how to apply merge method using last dataframe state with lambda function without using pipe?
The code below works. But it depends on the pipe method.
(pd.DataFrame(
[{'YEAR':2013,'FK':1, 'v':1},
{'YEAR':2013,'FK':2, 'v':2},
{'YEAR':2014,'FK':1, 'v':3},
{'YEAR':2014,'FK':2, 'v':4}
])
.pipe(lambda w: w.merge(w.query('YEAR==2013')[['FK','v']],
on='FK',
how='left'
))
)
The code below doesn't work.
(pd.DataFrame(
[{'YEAR':2013,'FK':1, 'v':1},
{'YEAR':2013,'FK':2, 'v':2},
{'YEAR':2014,'FK':1, 'v':3},
{'YEAR':2014,'FK':2, 'v':4}
])
.merge(lambda w: w.query('YEAR==2013'),
on='FK',
how='left'
)
)
Return:
TypeError: Can only merge Series or DataFrame objects, a <class 'function'> was passed
You can't, this is precisely why the pipe
method exists.
For completeness, DataFrame methods/accessors that accept a callable (as primary parameter and as of pandas 2.0.3) are:
For other cases, you need to use pipe
.