pythonpandasazure-synapseazure-notebooks

How to replace parameters in one table with values from another table?


I am currently working on creating a JSON file from data in a processed CSV file through an SQL query, and I am performing this task in a notebook in Synapse. Currently, I have the JSON structure functioning, but I encounter a situation where two columns in the CSV contain parameters. I need to retrieve values based on these parameters from another table.

Table1:

Id data1 Parameters1 Parameters2
1 extradata Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true "Example":"(new int[] {Hours.First()/24})"
2 extradata Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true "Example":"(new int[] {Hours})"

Parameters Table:

ParameterName Value
MinimumNumber 30
Time 5
Hours 24

I would like to replace the values before creating the final JSON file and get an array with the entered values based on the indicated parameter value

New Table:

Id data1 Parameters1 Parameters2
1 extradata Example.ValidateData(input1, {30}, {5}, null) == true "Example":"(new int[] {Hours.First()/24})"
2 extradata Example.ValidateData(input1, {30}, {5}, null) == true "Example":"(new int[] {24})"

Solution

  • You can use the replacement function below to get a new table in the required format in a Synapse notebook using Python:

    def replace_parameters(row, parameter_df):
        for parameter_name, value in parameter_df.values:
            row = re.sub(rf'{{\s*{re.escape(parameter_name)}\s*}}', f'{{{value}}}', row)
        return row
    

    Apply the replacement function to the Parameters1 and Parameters2 columns and print it using the code below:

    table1_df['Parameters1'] = table1_df['Parameters1'].apply(replace_parameters, parameter_df=parameters_df)
    table1_df['Parameters2'] = table1_df['Parameters2'].apply(replace_parameters, parameter_df=parameters_df)
    print(table1_df)
    

    You will get the output as below:

    enter image description here
    
    +---+----------+----------------------------------------------------+-----------------------------------------+
    | Id | data     | parameters1                                       | parameters2                             |
    +---+----------+----------------------------------------------------+-----------------------------------------+
    | 1  | extradata| Example.ValidateData(input1, {30}, {5}, null) == true| “Example”:"(new int[] {Hours.First()/24})"|
    +---+----------+----------------------------------------------------+-----------------------------------------+
    | 2  | extradata| Example.ValidateData(input1, {30}, {5}, null) == true| “Example”:"(new int[] {24})"             |
    +---+----------+----------------------------------------------------+-----------------------------------------+
    

    Here is my complete code for reference:

    import pandas as pd
    import re
    table1_data = {
        'Id': [1, 2],
        'data1': ['extradata', 'extradata'],
        'Parameters1': ['Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true', 'Example.ValidateData(input1, {MinimumNumber}, {Time}, null) == true'],
        'Parameters2': ['"Example":"(new int[] {Hours.First()/24})"', '"Example":"(new int[] {Hours})"']
    }
    parameters_data = {
        'ParameterName': ['MinimumNumber', 'Time', 'Hours'],
        'Value': [30, 5, 24]
    }
    table1_df = pd.DataFrame(table1_data)
    parameters_df = pd.DataFrame(parameters_data)
    
    def replace_parameters(row, parameter_df):
        for parameter_name, value in parameter_df.values:
            row = re.sub(rf'{{\s*{re.escape(parameter_name)}\s*}}', f'{{{value}}}', row)
        return row
    
    table1_df['Parameters1'] = table1_df['Parameters1'].apply(replace_parameters, parameter_df=parameters_df)
    table1_df['Parameters2'] = table1_df['Parameters2'].apply(replace_parameters, parameter_df=parameters_df)
    print(table1_df)