pythonpandasvariable-substitution

Substituting variable in a dataframe row based on other row's value


I have a dataframe that contains ID, Formula, and a dependent ID column that I extracted the ID from the Formula column. Now I have to substitute all the dependent ID into the formulas based on the dataframe.

My approach is to run a nested loop for each row to substitute a dependent ID in the formula using the replace function. The loop would stop until there's no more possible substitution. However I don't know where to begin and not sure if this is the correct approach.

I am wondering if there's any function that can make the process easier?

Here is the code to create the current dataframe:

data = pd.DataFrame({'ID':['A1','A3','B2','C2','D3','E3'],
    'Formula':['C2/500','If B2 >10 then  (B2*D3) + 100 else D3+10','E3/2 +20','E3/2 +20','var_i','var_x'],
    'Dependent ID':['C2','B2, D3','E3','D3, E3', '','']})

Here are the examples of my current dataframe and my desire end result. Current dataframe: Current DF

Desire end result: Desired DF


Solution

  • Recursively replace dependent ID inside formula with formula:

    df = pd.DataFrame({'ID':['A1','A3','B2','C2','D3','E3'],
        'Formula':['C2/500','If B2 >10 then  (B2*D3) + 100 else D3+10','E3/2 +20','D3+E3','var_i','var_x'],
        'Dependent ID':['C2','B2,D3','E3','D3,E3', '','']})
    
    def find_formula(formula:str, ids:str):
        #replace all the ids inside formula with the correct formula
        if ids == '':
            return formula
        ids = ids.split(',')
        for x in ids:
            sub_formula = df.loc[df['ID']==x, 'Formula'].values[0]
            sub_id = df.loc[df['ID']==x, 'Dependent ID'].values[0]
            formula = formula.replace(x, find_formula(sub_formula, sub_id))
        return formula
    
    df['new_formula']=df.apply(lambda x: find_formula(x['Formula'], x['Dependent ID']), axis=1)
    

    output:

    ID  Formula Dependent ID    new_formula
    0   A1  C2/500      C2      var_i+var_x/500
    1   A3  If B2 >10 then (B2*D3) + ...    If var_x/2 +20 >10 then (var_x/2 +20*var_i) + ...
    2   B2  E3/2 +20    E3      var_x/2 +20
    3   C2  D3+E3       D3,E3   var_i+var_x
    4   D3  var_i               var_i
    5   E3  var_x               var_x