python pandas azure-synapse azure-notebooks

How to replace parameters in one table with values from another table?

I am currently working on creating a JSON file from data in a processed CSV file through an SQL query, and I am performing this task in a notebook in Synapse. Currently, I have the JSON structure functioning, but I encounter a situation where two columns in the CSV contain parameters. I need to retrieve values based on these parameters from another table.

Table1:

Id	data1	Parameters1	Parameters2
1	extradata	Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true	"Example":"(new int[] {Hours.First()/24})"
2	extradata	Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true	"Example":"(new int[] {Hours})"

Parameters Table:

ParameterName	Value
MinimumNumber	30
Time	5
Hours	24

I would like to replace the values before creating the final JSON file and get an array with the entered values based on the indicated parameter value

New Table:

Id	data1	Parameters1	Parameters2
1	extradata	Example.ValidateData(input1, {30}, {5}, null) == true	"Example":"(new int[] {Hours.First()/24})"
2	extradata	Example.ValidateData(input1, {30}, {5}, null) == true	"Example":"(new int[] {24})"

Solution

You can use the replacement function below to get a new table in the required format in a Synapse notebook using Python:

def replace_parameters(row, parameter_df):
    for parameter_name, value in parameter_df.values:
        row = re.sub(rf'{{\s*{re.escape(parameter_name)}\s*}}', f'{{{value}}}', row)
    return row

Apply the replacement function to the Parameters1 and Parameters2 columns and print it using the code below:

table1_df['Parameters1'] = table1_df['Parameters1'].apply(replace_parameters, parameter_df=parameters_df)
table1_df['Parameters2'] = table1_df['Parameters2'].apply(replace_parameters, parameter_df=parameters_df)
print(table1_df)

You will get the output as below:

enter image description here

+---+----------+----------------------------------------------------+-----------------------------------------+
| Id | data     | parameters1                                       | parameters2                             |
+---+----------+----------------------------------------------------+-----------------------------------------+
| 1  | extradata| Example.ValidateData(input1, {30}, {5}, null) == true| “Example”:"(new int[] {Hours.First()/24})"|
+---+----------+----------------------------------------------------+-----------------------------------------+
| 2  | extradata| Example.ValidateData(input1, {30}, {5}, null) == true| “Example”:"(new int[] {24})"             |
+---+----------+----------------------------------------------------+-----------------------------------------+

Here is my complete code for reference:

import pandas as pd
import re
table1_data = {
    'Id': [1, 2],
    'data1': ['extradata', 'extradata'],
    'Parameters1': ['Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true', 'Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true'],
    'Parameters2': ['"Example":"(new int[] {Hours.First()/24})"', '"Example":"(new int[] {Hours})"']
}
parameters_data = {
    'ParameterName': ['MinimumNumber', 'Time', 'Hours'],
    'Value': [30, 5, 24]
}
table1_df = pd.DataFrame(table1_data)
parameters_df = pd.DataFrame(parameters_data)

def replace_parameters(row, parameter_df):
    for parameter_name, value in parameter_df.values:
        row = re.sub(rf'{{\s*{re.escape(parameter_name)}\s*}}', f'{{{value}}}', row)
    return row

table1_df['Parameters1'] = table1_df['Parameters1'].apply(replace_parameters, parameter_df=parameters_df)
table1_df['Parameters2'] = table1_df['Parameters2'].apply(replace_parameters, parameter_df=parameters_df)
print(table1_df)