I have a df
. I would like to expand df
with two new columns, namely new_foo
and new_bar
, by two existing columns, namely old_foo
and old_bar
, as input.
In old_foo
, old_bar
, new_foo
and new_bar
columns, each entry is a list of numbers.
I would like to create the new columns new_foo
and new_bar
as following:
df[["new_foo", "new_bar"]] = df.apply(lambda row: a_func(
row["original_foo"][row.foo - bar : row.foo - bar + baz],
row["original_bar"][row.foo - bar : row.foo - bar + baz]), axis=1, result_type="expand")
a_func
is a function that takes two lists of numbers and returns two lists of numbers.
I want to find a way so I don't need to repeatedly write the similar index part inside [ ]
.
In other words, I want to simplify the above code snippet by creating a variable start = row.foo - bar
so that the code becomes
df[["new_foo", "new_bar"]] = df.apply(lambda row: a_func(
row["original_foo"][start : start + baz],
row["original_bar"][start : start + baz]), axis=1, result_type="expand")
The part that prevents me from doing this is that start
depends on foo
, which is different for each row (i.e., row.foo
).
In Python 3.8+, you should be able to use Assignment Expressions:
df[["new_foo", "new_bar"]] = df.apply(
lambda row: a_func(
row["original_foo"][(start := row.foo - bar) : start + baz],
row["original_bar"][start : start + baz],
),
axis=1,
result_type="expand",
)