pythonpandasdataframelambdachaining

How to merge pandas dataframe passing a lambda as first parameter?


Restricting to pandas method chaining, how to apply merge method using last dataframe state with lambda function without using pipe?

The code below works. But it depends on the pipe method.

(pd.DataFrame(
    [{'YEAR':2013,'FK':1, 'v':1},
     {'YEAR':2013,'FK':2, 'v':2},
     {'YEAR':2014,'FK':1, 'v':3},
     {'YEAR':2014,'FK':2, 'v':4}
    ])
  .pipe(lambda w: w.merge(w.query('YEAR==2013')[['FK','v']],
        on='FK',
        how='left'
       ))
)

The code below doesn't work.

(pd.DataFrame(
    [{'YEAR':2013,'FK':1, 'v':1},
     {'YEAR':2013,'FK':2, 'v':2},
     {'YEAR':2014,'FK':1, 'v':3},
     {'YEAR':2014,'FK':2, 'v':4}
    ])
 .merge(lambda w: w.query('YEAR==2013'),
        on='FK',
        how='left'
       )
)

Return: TypeError: Can only merge Series or DataFrame objects, a <class 'function'> was passed


Solution

  • You can't, this is precisely why the pipe method exists.

    For completeness, DataFrame methods/accessors that accept a callable (as primary parameter and as of pandas 2.0.3) are:

    For other cases, you need to use pipe.