azureazure-data-factorypipelineazure-synapse

comparing row count from file to CSV control file


Inside a Azure synapse pipeline, I need to find and compare the row count of the CSV file with summary file stored in Azure Blob storage.

If the comparison fails then pipeline will stop, if the counts match then can proceed ahead in the pipeline.

pipeline, which components


Solution

  • If your rows are less than 5000, you can use lookup activities for this requirement which will give the required row count of the files.

    Create datasets for your files and give those to two lookup activities like below. Make sure you uncheck the First row only option in both activities.

    enter image description here

    Now, take an if activity and use the below expression to compare both row counts.

    @equals(activity('count of control').output.count,activity('count of csv').output.count)
    

    enter image description here

    If the condition satisfies, you can add the further activities in the True activities of if activity. Otherwise, you can stop the pipeline by not giving any activity in the False activities of if activity.

    If the row count in your control file is like below sample.

    filename,count
    one.csv,5
    

    Then, compare the count property from the control file lookup activity output array with the CSV file row count.

    And, if the CSV file count is more than 5000, you can use ADF dataflow for this.

    Get the row count of CSV file using below transformations in the Dataflow.

    After dataflow activity, compare the row count of the CSV file with your control file count using if activity.