I want to apply a custom function which takes 2 columns and outputs a value based on those (row-based)
In Pandas there is a syntax to apply a function based on values in multiple columns
df['col_3'] = df.apply(lambda x: func(x.col_1, x.col_2), axis=1)
What is the syntax for this in Polars?
In polars, you don't add columns by assigning just the value of the new column. You always have to assign the whole df (in other words there's never ['col_3']
on the left side of the =
)
To that end if you want your original df with a new column then you use the with_columns
method.
you would do
df = (
df
.with_columns(
pl.struct('col_1','col_2')
.map_elements(lambda x: func(x['col_1'], x['col_2']))
.alias('col_3')
)
)
A struct is a dataframe inside a column of a dataframe. This is helpful because map_elements
(and indeed all expressions) can only be invoked from a single column. The map_elements
turns the struct, in each row, into dict and that becomes the input to your function. map_elements
is for functions which take a single input and output a single value. (If you're using a vectorized function that expects something like a list and returns another list then you should use map_batches
). Finally, you do alias
on that to give it the name you want it to have.