azureazure-data-factoryexpressionbuilder

Count the categories from particular one column in using ADF(azure data factory)


I have data in which it has one column named category. Categories are chair and table.

I have tried to do filtering, derived column but derived column give me additional column but i want it at row level.


Solution

  • As the remaining values doesn't exists in the column, Dataflow can't give the count for those.

    So, you can follow below approach to achieve your requirement. In this approach, you need an array containing values of all categories like ['chair','table','bowl','plat'].

    This is starting data that I am taking in the dataflow source:

    category
    chair
    table
    chair
    table
    table
    chair
    chair
    table
    

    First use an aggregate transformation to get the category and count columns.

    Give category column in the Group by section and create a new column count with count(category) expression in the aggregate section like below.

    enter image description here

    It will give the result with existing values.

    enter image description here

    Now, take another aggregate transformation after this. In this, leave group by section empty and in the aggregate section, create a column for an array of existing categories using collect(category) expression.

    enter image description here

    It will produce below results.

    enter image description here

    Next, take a derived column transformation and create two columns using the above original array column and the array of all categories.

    rem_items - toString(unfold(except(['chair','table','bowl','plat'],original)))
    
    count2 - 0
    

    enter image description here

    This will first generate an array of non-existing categories and unfolds that array as rows. Those will be converted to string type column like below. Coming to count2 columns, whatever the values that are coming from this transformation, the count for those will be 0 only.

    enter image description here

    Now, use a select transformation and remove the extra original column.

    enter image description here

    As a result, select transformation will only contains two columns.

    Now, create a New branch after the aggregate1 transformation and add a Union transformation to it. Use the above select1 transformation as another input and do the union by position.

    enter image description here

    This will give the desired result.

    enter image description here

    Add your sink to the union transformation and you can run the dataflow.