I am in a situation where I need to multiply several columns by the value corresponding to the interval.
I'm using Pandas, but I'm using a for loop to get the value as follows.
I think there is a better way than this, can someone give me some advice?
sample:
import pandas as pd
d = {
5: 5,
4: 7,
3: 2,
2: -4,
1: -5,
}
def mul(value):
if value < 1: return value * 0.01
elif 5 < value: return value * 20
for k, v in d.items():
if value <= k:
val = value * v
break
return val
a = [
{'a': 5, 'b': 10, 'c': 2},
{'a': 4, 'b': 0.5, 'c': 4},
{'a': 3.5, 'b': 1.5, 'c': 5},
{'a': 2.1, 'b': 5, 'c': 6},
{'a': 0.1, 'b': 1, 'c': 7},
]
df = pd.DataFrame(a)
a = []
b = []
for i in df.values:
a.append(mul(i[0]))
b.append(mul(i[1]))
df['a1'] = a
df['b1'] = b
print(df.head())
"""
a b c a1 b1
0 5.0 10.0 2.0 25.000 200.000
1 4.0 0.5 4.0 20.000 0.005
2 3.5 1.5 5.0 17.500 7.500
3 2.1 5.0 6.0 10.500 25.000
4 0.1 1.0 7.0 0.001 5.000
"""
Since the question was edited, the logic changed a lot and you can't easily vectorize your operation. Mapping of the values strongly depends on the internal logic of the mul
function and here is dependent on the order of the keys in the dictionary.
Without vectorizing, you could simplify your code to:
cols = ['a', 'b']
out = df.join(df[cols].map(mul).add_suffix('1'))
Output:
a b c a1 b1
0 5.0 10.0 2 25.000 200.000
1 4.0 0.5 4 20.000 0.005
2 3.5 1.5 5 17.500 7.500
3 2.1 5.0 6 10.500 25.000
4 0.1 1.0 7 0.001 5.000
This is exactly what numpy.searchsorted
is doing, just combine it with integer array indexing:
import numpy as np
k = np.array(list(d)) # array([1, 2, 3, 4, 5])
v = np.array(list(d.values())) # array([-5, -4, 2, 7, 5])
out = df.mul(v[np.searchsorted(k, df)])
NB. the keys in d
must be sorted. Also, this uses the default side='left'
parameter of searchsorted
.
Output:
a b
0 25.0 -8.0
1 28.0 -2.5
2 24.5 -6.0
3 4.2 25.0
4 -5.0 -5.0
To combine to the original DataFrame, join
:
out = df.join(df.mul(v[np.searchsorted(k, df)]).add_suffix('1'))
Output:
a b a1 b1
0 5.0 2.0 25.0 -8.0
1 4.0 0.5 28.0 -2.5
2 3.5 1.5 24.5 -6.0
3 2.1 5.0 4.2 25.0
4 1.0 1.0 -5.0 -5.0
Intermediate (indices of matching values in k
):
# np.searchsorted(k, df)
array([[4, 1],
[3, 0],
[3, 1],
[2, 4],
[0, 0]])