Calculating based on rows conditions (in pandas)

I encountered the following problem: I have a pandas dataframe that looks like this.

id_tranc	sum	bid
1	4000	2.3%
1	20000	3.5%
2	100000	if >=100 000 - 1.6%, if < 100 000 - 100$
3	30000	if >=100 000 - 1.6%, if < 100 000 - 100$
1	60000	500$

code_to_create_dataset:

dataframe = pd.DataFrame({
    'id_tranc': [1, 1, 2, 3, 1],
    'sum': [4000, 20000, 100000, 30000, 60000],
    'bid': ['2.3%', '3.5%', 'if >=100 000 - 1.6%, if < 100 000 - 100$', 
            'if >=100 000 - 1.6%, if < 100 000 - 100$', '500$']})

Necessary to calculated 'commission', depending columns 'sum' and 'bid'. Final dataframe should be look like:

id_tranc	sum	bid	comission
1	4000	2.3%	92
1	20000	3.5%	700
2	100000	if >=100 000 - 1.6%, if < 100 000 - 100$	1600
3	30000	if >=100 000 - 1.6%, if < 100 000 - 100$	100
1	60000	500$	500

If calculated with df['commission'] = df['sum'] * df['bid'] - getting result only for first 2 record. Please tell me how to do this correctly.

Solution

I would write a small parser based on a regex and operator:

from operator import ge, lt, gt, le
import re

def logic(value, bid):
    # define operators, add other ones if needed
    ops = {'>=': ge, '>': gt, '<': lt, '<=': le}
    # remove spaces, split conditions on comma
    conditions = bid.replace(' ', '').split(',')
    # then loop over them, the first match will be used
    for cond in conditions:
        # extract operator, threshold, commission, unit
        m = re.search('(?:if(\W+)(\d+)-)?(\d+\.?\d*)([%$])', cond)
        if not m:     # if no match, ignore
            continue
        op, thresh, com, unit = m.groups()
        # if no condition or condition is valid
        if (not op) or (op and ops[op](value, float(thresh))):
            if unit == '%':                     # handle % case
                return value * float(com)/100
            elif unit == '$':                   # handle fixed com
                return float(com)

df['comission'] = [logic(val, bid) for val, bid in zip(df['sum'], df['bid'])]

# or with apply, which is less efficient
# df['comission'] = df.apply(lambda row: logic(row['sum'], row['bid']), axis=1)

Output:

   id_tranc     sum                                       bid  comission
0         1    4000                                      2.3%       92.0
1         1   20000                                      3.5%      700.0
2         2  100000  if >=100 000 - 1.6%, if < 100 000 - 100$     1600.0
3         3   30000  if >=100 000 - 1.6%, if < 100 000 - 100$      100.0
4         1   60000                                      500$      500.0

Regex:

regex demo

(?:if(\W+)(\d+)-)?  # optionally match a condition (operator and threshold)
(\d+\.?\d*)         # match the value of the commission
([%$])              # match type of commission (% or $)

Reproducible input:

df = pd.DataFrame({'id_tranc': [1, 1, 2, 3, 1], 'sum': [4000, 20000, 100000, 30000, 60000],
                   'bid': ['2.3%', '3.5%', 'if >=100 000 - 1.6%, if < 100 000 - 100$',
                           'if >=100 000 - 1.6%, if < 100 000 - 100$', '500$']})