pythonpandas

Simplify/ Modularize the input parameters of the function in lambda generator


I have a df. I would like to expand df with two new columns, namely new_foo and new_bar, by two existing columns, namely old_foo and old_bar, as input.

In old_foo, old_bar, new_foo and new_bar columns, each entry is a list of numbers.

I would like to create the new columns new_foo and new_bar as following:

df[["new_foo", "new_bar"]] = df.apply(lambda row: a_func(
        row["original_foo"][row.foo - bar : row.foo - bar + baz], 
        row["original_bar"][row.foo - bar : row.foo - bar + baz]), axis=1, result_type="expand")

a_func is a function that takes two lists of numbers and returns two lists of numbers.

I want to find a way so I don't need to repeatedly write the similar index part inside [ ]. In other words, I want to simplify the above code snippet by creating a variable start = row.foo - bar so that the code becomes

df[["new_foo", "new_bar"]] = df.apply(lambda row: a_func(
        row["original_foo"][start : start + baz], 
        row["original_bar"][start : start + baz]), axis=1, result_type="expand")

The part that prevents me from doing this is that start depends on foo, which is different for each row (i.e., row.foo).


Solution

  • In Python 3.8+, you should be able to use Assignment Expressions:

    df[["new_foo", "new_bar"]] = df.apply(
        lambda row: a_func(
            row["original_foo"][(start := row.foo - bar) : start + baz],
            row["original_bar"][start : start + baz],
        ),
        axis=1,
        result_type="expand",
    )