I have tried to do this in order to create a new column, with each row being an array containing the values of column b multiplied by column a.
data = {'a': [3, 2], 'b': [[4], [7, 2]]}
df = pd.DataFrame(data)
df['c'] = df.apply(lambda row: [row['a'] * x for x in row['b']])
The final result should look like this
a | b | c |
---|---|---|
3 | [4] | [12] |
2 | [7, 2] | [14, 4] |
Your approach would have been correct with axis=1
(= row-wise, the default apply is column-wise):
df['c'] = df.apply(lambda row: [row['a'] * x for x in row['b']], axis=1)
Using apply
is however quite slow since pandas creates an intermediate Series for each row. It will be more efficient to use pure python: a list comprehension is well suited.
df['c'] = [[a * x for x in b] for a, b in zip(df['a'], df['b'])]
Output:
a b c
0 3 [4] [12]
1 2 [7, 2] [14, 4]
(on 200k rows)
# list comprehension
# df['c'] = [[a * x for x in b] for a, b in zip(df['a'], df['b'])]
98.3 ms ± 3.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# conversion to numpy arrays
# df['c'] = df['a'] * df['b'].map(np.array)
371 ms ± 75.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# apply with axis=1
# df['c'] = df.apply(lambda row: [row['a'] * x for x in row['b']], axis=1)
1.65 s ± 65.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)