azure-data-factory

ADF dynamic row starting position


We have an excel file which we need to convert to CSV using ADF.

The excel file has data table starting from row 5.

enter image description here

However, there are instances where "Time" (on row 3) is missing and the data starts from row 4.

Is there anyway we can check row index of "ProductID" to determine starting position of the data table?

enter image description here


Solution

  • I tried to get cell A4 content and compare with word "ProductID" to check if it starts from row 4 otherwise row 5. Is there any better way to do this?

    If you know that the correct header is either A4 or A5, you can directly opt for this approach. As it will successfully deal with two cases of either A4 or A5.

    But if you don't have the control over the first row, you can try the below approach involving loop to find out the required row position.

    First create the excel dataset with one dataset parameter range of type string for the Range and another parameter header of Boolean type for First Row as header.

    enter image description here

    Now, in the pipeline create the below variables. In this approach, a counter variable will be incremented by 1 from value 1 in an Until loop. The counter variable will be checked with the Range of the Excel dataset in each dataset. After each iteration, it checks whether the first-row value is ProductID or not. If not, the counter will be checked in next iteration otherwise the current counter value is the correct header row, and it will be the end of the loop.

    enter image description here

    Take an Until activity with below expression as condition.

    @equals(variables('col'),'ProductID')
    

    enter image description here

    Inside Until activity, take a set variable activity and assign the counter variable to the temp value.

    enter image description here

    Now, take lookup activity with the above Excel dataset. Provide the counter variable for the range and false to the header.

    range - @concat('A',string(variables('counter')))
    header - false
    

    enter image description here

    Take a set variable and store the first-row value in first column in the col variable.

    @activity('Lookup1').output.value[0].Prop_0
    

    In another set variable, use the expression @add(variables('temp'),1) to the counter variable which will increment the counter variable.

    enter image description here

    After the end of the Until loop, the correct header row will be stored in the counter variable. You can use the same Excel dataset in your sub-sequent activities with the counter variable in the range parameter and now change the header parameter value to true.

    enter image description here