azureazure-data-factoryexpressionbuilder

Can we select rows with duplicate entries in Alter Row transformation in Azure Data Factory?


There are two columns:

col_A col_B
111 2.0
222 1.0
222 2.0
333 1.0

Using alter row transformation i would like to select rows that have repeated entries in col_A, in this example 222 and select the corresponding highest value in col_B i.e. 2.0

The output should look as follows:

col_A col_B
111 2.0
222 2.0
333 1.0

Solution

  • You can use Aggregate transformation in Data flow.

    This is my source data:

    enter image description here

    In the Aggregate transformation, give col_A for group by to select the distict rows and col_B for aggregate. Use the max(col_B) in this to get the max values.

    Group By:

    enter image description here

    Aggregate:

    enter image description here

    Result:

    enter image description here

    Update about Extra Columns:

    To Carry the Remaining columns after aggregation transformation, use Join transformation with Join type as inner join(Join of above with source) and give the above two columns as Keys.

    enter image description here

    After Join, you can remove the extra columns with select transformation.

    enter image description here

    Result with Extra rows:

    enter image description here