I am currently working on creating a JSON file from data in a processed CSV file through an SQL query, and I am performing this task in a notebook in Synapse. Currently, I have the JSON structure functioning, but I encounter a situation where two columns in the CSV contain parameters. I need to retrieve values based on these parameters from another table.
Table1:
Id | data1 | Parameters1 | Parameters2 |
---|---|---|---|
1 | extradata | Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true | "Example":"(new int[] {Hours.First()/24})" |
2 | extradata | Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true | "Example":"(new int[] {Hours})" |
Parameters Table:
ParameterName | Value |
---|---|
MinimumNumber | 30 |
Time | 5 |
Hours | 24 |
I would like to replace the values before creating the final JSON file and get an array with the entered values based on the indicated parameter value
New Table:
Id | data1 | Parameters1 | Parameters2 |
---|---|---|---|
1 | extradata | Example.ValidateData(input1, {30}, {5}, null) == true | "Example":"(new int[] {Hours.First()/24})" |
2 | extradata | Example.ValidateData(input1, {30}, {5}, null) == true | "Example":"(new int[] {24})" |
You can use the replacement function below to get a new table in the required format in a Synapse notebook using Python:
def replace_parameters(row, parameter_df):
for parameter_name, value in parameter_df.values:
row = re.sub(rf'{{\s*{re.escape(parameter_name)}\s*}}', f'{{{value}}}', row)
return row
Apply the replacement function to the Parameters1 and Parameters2 columns and print it using the code below:
table1_df['Parameters1'] = table1_df['Parameters1'].apply(replace_parameters, parameter_df=parameters_df)
table1_df['Parameters2'] = table1_df['Parameters2'].apply(replace_parameters, parameter_df=parameters_df)
print(table1_df)
You will get the output as below:
enter image description here
+---+----------+----------------------------------------------------+-----------------------------------------+
| Id | data | parameters1 | parameters2 |
+---+----------+----------------------------------------------------+-----------------------------------------+
| 1 | extradata| Example.ValidateData(input1, {30}, {5}, null) == true| “Example”:"(new int[] {Hours.First()/24})"|
+---+----------+----------------------------------------------------+-----------------------------------------+
| 2 | extradata| Example.ValidateData(input1, {30}, {5}, null) == true| “Example”:"(new int[] {24})" |
+---+----------+----------------------------------------------------+-----------------------------------------+
Here is my complete code for reference:
import pandas as pd
import re
table1_data = {
'Id': [1, 2],
'data1': ['extradata', 'extradata'],
'Parameters1': ['Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true', 'Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true'],
'Parameters2': ['"Example":"(new int[] {Hours.First()/24})"', '"Example":"(new int[] {Hours})"']
}
parameters_data = {
'ParameterName': ['MinimumNumber', 'Time', 'Hours'],
'Value': [30, 5, 24]
}
table1_df = pd.DataFrame(table1_data)
parameters_df = pd.DataFrame(parameters_data)
def replace_parameters(row, parameter_df):
for parameter_name, value in parameter_df.values:
row = re.sub(rf'{{\s*{re.escape(parameter_name)}\s*}}', f'{{{value}}}', row)
return row
table1_df['Parameters1'] = table1_df['Parameters1'].apply(replace_parameters, parameter_df=parameters_df)
table1_df['Parameters2'] = table1_df['Parameters2'].apply(replace_parameters, parameter_df=parameters_df)
print(table1_df)