pythonpandas

How do I multiply the values of an array contained in a column in a dataframe, by the value in another column?


I have tried to do this in order to create a new column, with each row being an array containing the values of column b multiplied by column a.

data = {'a': [3, 2], 'b': [[4], [7, 2]]}
df = pd.DataFrame(data)
df['c'] = df.apply(lambda row: [row['a'] * x for x in row['b']])

The final result should look like this

a b c
3 [4] [12]
2 [7, 2] [14, 4]

Solution

  • Your approach would have been correct with axis=1 (= row-wise, the default apply is column-wise):

    df['c'] = df.apply(lambda row: [row['a'] * x for x in row['b']], axis=1)
    

    Using apply is however quite slow since pandas creates an intermediate Series for each row. It will be more efficient to use pure python: a list comprehension is well suited.

    df['c'] = [[a * x for x in b] for a, b in zip(df['a'], df['b'])]
    

    Output:

       a       b        c
    0  3     [4]     [12]
    1  2  [7, 2]  [14, 4]
    

    Comparison of timings

    (on 200k rows)

    # list comprehension
    # df['c'] = [[a * x for x in b] for a, b in zip(df['a'], df['b'])]
    98.3 ms ± 3.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    # conversion to numpy arrays
    # df['c'] = df['a'] * df['b'].map(np.array)
    371 ms ± 75.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    # apply with axis=1
    # df['c'] = df.apply(lambda row: [row['a'] * x for x in row['b']], axis=1)
    1.65 s ± 65.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)