pythonpython-polars

Apply a function to 2 columns in Polars


I want to apply a custom function which takes 2 columns and outputs a value based on those (row-based)

In Pandas there is a syntax to apply a function based on values in multiple columns

df['col_3'] = df.apply(lambda x: func(x.col_1, x.col_2), axis=1)

What is the syntax for this in Polars?


Solution

  • In polars, you don't add columns by assigning just the value of the new column. You always have to assign the whole df (in other words there's never ['col_3'] on the left side of the =)

    To that end if you want your original df with a new column then you use the with_columns method.

    you would do

    df = (
        df
        .with_columns(
            pl.struct('col_1','col_2')
           .map_elements(lambda x: func(x['col_1'], x['col_2']))
           .alias('col_3')
           )
        )
    

    A struct is a dataframe inside a column of a dataframe. This is helpful because map_elements (and indeed all expressions) can only be invoked from a single column. The map_elements turns the struct, in each row, into dict and that becomes the input to your function. map_elements is for functions which take a single input and output a single value. (If you're using a vectorized function that expects something like a list and returns another list then you should use map_batches). Finally, you do alias on that to give it the name you want it to have.