pythonpandas

Multiply the value determined for each section


I am in a situation where I need to multiply several columns by the value corresponding to the interval.

I'm using Pandas, but I'm using a for loop to get the value as follows.

I think there is a better way than this, can someone give me some advice?

sample:

import pandas as pd

d = {
    5: 5,
    4: 7,
    3: 2,
    2: -4,
    1: -5,
}
def mul(value):
    if value < 1: return value * 0.01
    elif 5 < value: return value * 20
    for k, v in d.items():
        if value <= k:
            val = value * v
            break
    return val

a = [
    {'a': 5, 'b': 10, 'c': 2},
    {'a': 4, 'b': 0.5, 'c': 4},
    {'a': 3.5, 'b': 1.5, 'c': 5},
    {'a': 2.1, 'b': 5, 'c': 6},
    {'a': 0.1, 'b': 1, 'c': 7},
]

df = pd.DataFrame(a)
a = []
b = []
for i in df.values:
    a.append(mul(i[0]))
    b.append(mul(i[1]))
df['a1'] = a
df['b1'] = b
print(df.head())

"""
     a     b     c      a1       b1
0  5.0  10.0   2.0  25.000  200.000
1  4.0   0.5   4.0  20.000    0.005
2  3.5   1.5   5.0  17.500    7.500
3  2.1   5.0   6.0  10.500   25.000
4  0.1   1.0   7.0   0.001    5.000
"""

Solution

  • Updated answer to the edited question

    Since the question was edited, the logic changed a lot and you can't easily vectorize your operation. Mapping of the values strongly depends on the internal logic of the mul function and here is dependent on the order of the keys in the dictionary.

    Without vectorizing, you could simplify your code to:

    cols = ['a', 'b']
    out = df.join(df[cols].map(mul).add_suffix('1'))
    

    Output:

         a     b  c      a1       b1
    0  5.0  10.0  2  25.000  200.000
    1  4.0   0.5  4  20.000    0.005
    2  3.5   1.5  5  17.500    7.500
    3  2.1   5.0  6  10.500   25.000
    4  0.1   1.0  7   0.001    5.000
    

    Original answer to the initial question

    This is exactly what numpy.searchsorted is doing, just combine it with integer array indexing:

    import numpy as np
    
    k = np.array(list(d))          # array([1, 2, 3, 4, 5])
    v = np.array(list(d.values())) # array([-5, -4,  2,  7,  5])
    
    out = df.mul(v[np.searchsorted(k, df)])
    

    NB. the keys in d must be sorted. Also, this uses the default side='left' parameter of searchsorted.

    Output:

          a     b
    0  25.0  -8.0
    1  28.0  -2.5
    2  24.5  -6.0
    3   4.2  25.0
    4  -5.0  -5.0
    

    To combine to the original DataFrame, join:

    out = df.join(df.mul(v[np.searchsorted(k, df)]).add_suffix('1'))
    

    Output:

         a    b    a1    b1
    0  5.0  2.0  25.0  -8.0
    1  4.0  0.5  28.0  -2.5
    2  3.5  1.5  24.5  -6.0
    3  2.1  5.0   4.2  25.0
    4  1.0  1.0  -5.0  -5.0
    

    Intermediate (indices of matching values in k):

    # np.searchsorted(k, df)
    array([[4, 1],
           [3, 0],
           [3, 1],
           [2, 4],
           [0, 0]])