Could anyone help with advice on how to refactor this code to make it faster? We are ingesting JSON and transforming in preparation for a later step but this particular section of the Notebook is taking a lot longer than expected (and we're new to python). Code is below, thanks!
import pandas as pd
df_main = pd.DataFrame(table_data)
df_parameters = pd.DataFrame(parameter_data)
# Convert df_parameters to dictionary for faster lookups
parameter_dict = dict(zip(df_parameters['abc'], df_parameters['def']))
def replace_parameters(row):
if isinstance(row['expression'], str) and row['xyz'] == row['xyz']:
for parameter_name, value in parameter_dict.items():
if row['expression']:
row['expression'] = row['expression'].replace(f'|{parameter_name}|', str(value))
if row['sp']:
row['sp'] = row['sp'].replace(f'|{parameter_name}|', str(value))
return row
# Apply the replacement function only when xyz is equal to xyz
df_main = df_main.apply(replace_parameters, axis=1)
Instead of using a for loop
, use pattern matching. I believe this kind of expression you are using; if not, adjust the pattern accordingly.
Expressions I used:
|param1| + |param2|
|param2| - |param3|
|param1| * |param3|
|param2| / |param3|
You can use the code below.
Data I used:
import pandas as pd
import re
table_data = {
'expression': ['|param1| + |param2|', '|param2| - |param3|'],
'sp': ['|param1| * |param3|', '|param2| / |param3|'],
'xyz': [1, 2] # Sample values for 'xyz'
}
parameter_data = {
'abc': ['param1', 'param2', 'param3'],
'def': [10, 20, 30] # Sample values for parameters
}
df_main = pd.DataFrame(table_data)
df_parameters = pd.DataFrame(parameter_data)
Output:
Modify your function like below:
import re
parameter_dict = dict(zip(parameter_data['abc'], parameter_data['def']))
tmp = [ "\|" +i+ "\|" for i in parameter_data['abc']]
param_pattern = '|'.join(tmp)
# Define a function to perform replacements
def replace_parameters(row):
if isinstance(row['expression'], str) and row['xyz'] == row['xyz']:
row['expression'] = re.sub(param_pattern,lambda match: str(parameter_dict[match.group(0).strip('|')]), row['expression'])
row['sp'] = re.sub(param_pattern, lambda match: str(parameter_dict[match.group(0).strip('|')]), row['sp'])
return row
df_main = df_main.apply(replace_parameters, axis=1)
df_main
Output:
You can see the parameters are replaced. Once again, I am saying, according to your expressions, you create the pattern. For the above kind of expressions, below is the pattern I created: '\\|param1\\||\\|param2\\||\\|param3\\|'
.