pythonazure-data-factoryazure-batch

Azure Data Factory: Use Lookup result in ForEach loop with Python


My pipeline looks like the following:

enter image description here

Edit: I think I know why those [ ] in there - from the Python script the filenames are returned as a list of filenames. And the Lookup activity results then in the following output:

 {
                   "count": 1,
                   "value": [
                                     {
                                                        "Prop_0": "['BAT.WISCDD1C.SC.DK002001.D2023191.CSV'",
                                                        "Prop_1": " 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV'",
                                                        "Prop_2": " 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV'",
                                                        "Prop_3": " 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV'",
                                                        "Prop_4": " 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV'",
                                                        "Prop_5": " 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV'",
                                                        "Prop_6": " 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV'",
                                                        "Prop_7": " 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV'",
                                                        "Prop_8": " 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV'",
                                                        "Prop_9": " 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV'",
                                                        "Prop_10": " 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']"
                                     }
                   ],
                   "effectiveIntegrationRuntime": "integrationRuntime1 (West Europe)",
                   "billingReference": {
                                     "activityType": "PipelineActivity",
                                     "billableDuration": [
                                                        {
                                                                           "meterType": "ManagedVNetIR",
                                                                           "duration": 0.016666666666666666,
                                                                           "unit": "Hours"
                                                        }
                                     ]
                   },
                   "durationInQueue": {
                                     "integrationRuntimeQueue": 0
                   }
}

The ForEach loop should walk through the Prop_ elements and pass them one by one to my Custom Batch Python script.

enter image description here

In the Batch Custom activity I refer to the items as @{item()} as I guess I should. However, this is how the result looks like when the Python script receives the parameter!

    {Prop_0:[
'BAT.WISCDD1C.SC.DK002001.D2023191.CSV',
    Prop_1: 'BAT.WISCDD1C.SC.DK002001.TOT.D2023191.CSV',
    Prop_2: 'BAT.WISCDD4C.SC.DK002004.D2023191.CSV',
    Prop_3: 'BAT.WISCDE1C.SC.DK003001.D2023191.CSV',
    Prop_4: 'BAT.WISCDE3C.SC.DK003003.D2023191.CSV',
    Prop_5: 'BAT.WISCDE4C.SC.DK003004.D2023191.CSV',
    Prop_6: 'BAT.WISCDE5C.SC.DK003005.D2023191.CSV',
    Prop_7: 'BAT.WISCDP1C.SC.DK001001.D2023191.CSV',
    Prop_8: 'BAT.WISCDU1C.SC.DK004011.D2023191.CSV',
    Prop_9: 'BAT.WISCDU4C.SC.DK004014.D2023191.CSV',
    Prop_10: 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV']}

I don't understand how to use the Prop_ in the value! I would like to get only one CSV filename per ForEach iteration!

What am I doing wrong? Thank you for your support!


Solution

  • As your requirement is to get only one CSV filename per ForEach iteration, you can follow the below approach.

    After lookup, get the number of Prop keys in the object and storing in a string variable by using below expression.

    @string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))
    

    enter image description here

    Now, using this number, create an array of numbers like [0,1,2,..,10] and give that to ForEach activity.

    @range(0,int(variables('num_props')))
    

    enter image description here

    Inside ForEach, store the file name from the lookup activity output using the below expression.

    @activity('Lookup1').output.value[0][concat('Prop_',string(item()))]
    

    enter image description here

    This will give the file name in the current iteration, but it will have the special characters like this 'BAT.WISCDU7C.SC.DK004017.D2023191.CSV'].

    If you want to remove those, use the below expression which replaces them with empty string.

    @replace(replace(replace(replace(variables('filename1'), '[', ''),']',''),' ',''),'''','')
    

    enter image description here

    Now, give this variable filename to your Custom Batch Python script inside ForEach activity. This will pass the filename every iteration to your script.

    My Filename in an iteration:

    enter image description here

    My Pipeline JSON for your reference:

    {
        "name": "pipeline1",
        "properties": {
            "activities": [
                {
                    "name": "Lookup1",
                    "type": "Lookup",
                    "dependsOn": [],
                    "policy": {
                        "timeout": "0.12:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "source": {
                            "type": "DelimitedTextSource",
                            "storeSettings": {
                                "type": "AzureBlobFSReadSettings",
                                "recursive": true,
                                "enablePartitionDiscovery": false
                            },
                            "formatSettings": {
                                "type": "DelimitedTextReadSettings"
                            }
                        },
                        "dataset": {
                            "referenceName": "sourcecsv_ADLS",
                            "type": "DatasetReference"
                        },
                        "firstRowOnly": false
                    }
                },
                {
                    "name": "Get the number of Props",
                    "type": "SetVariable",
                    "dependsOn": [
                        {
                            "activity": "Lookup1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "policy": {
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "variableName": "num_props",
                        "value": {
                            "value": "@string(sub(length(split(string(activity('Lookup1').output.value),':')), 1))",
                            "type": "Expression"
                        }
                    }
                },
                {
                    "name": "ForEach1",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "Get the number of Props",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@range(0,int(variables('num_props')))",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "get current File name using Prop_n",
                                "type": "SetVariable",
                                "dependsOn": [],
                                "policy": {
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "filename1",
                                    "value": {
                                        "value": "@activity('Lookup1').output.value[0][concat('Prop_',string(item()))]",
                                        "type": "Expression"
                                    }
                                }
                            },
                            {
                                "name": "Replace extra spaces and char",
                                "type": "SetVariable",
                                "dependsOn": [
                                    {
                                        "activity": "get current File name using Prop_n",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "policy": {
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "filename",
                                    "value": {
                                        "value": "@replace(replace(replace(replace(variables('filename1'), '[', ''),']',''),' ',''),'''','')",
                                        "type": "Expression"
                                    }
                                }
                            }
                        ]
                    }
                }
            ],
            "variables": {
                "num_props": {
                    "type": "String"
                },
                "filename1": {
                    "type": "String"
                },
                "filename": {
                    "type": "String"
                }
            },
            "annotations": []
        }
    }