azure-data-factory

Azure Data Factory (ADF) Is it possible to compare the schema of a .csv file against the schema of a database before writing to it


I have a requirement to get the schema of a database and compare it against a provided .csv (coming from azure blob storage) which headers should match exactly the database schema before start writing the data of the csv to the db.

I was planining to use 2 get metadata activities to get the abovementioned schemas and then compare it againt each other.

I wanted to use the forEach activity but Im kind stuck with it.

The schema of the database will change "often" (also quite extensive) so a hardcoded solution would not be ideal

Any recommendation on how I could use a pipeline or a dataflow for this?


Solution

  • To achieve your requirement, after both Get meta data activities, you can use Filter activity and contains() functions inside it.

    First, add column count Field list in both Get meta data activities.

    enter image description here

    enter image description here

    Then add Filter activity and use below dynamic expressions in it.

    Items: - @activity('Get table structure').output.structure
    
    Condition - @contains(activity('Get csv structure').output.structure,json(concat('{"name":"',item().name,'","type":"String"}')))
    

    enter image description here

    Filter activity is used to filter out items from a given a list based on given condition.

    Both structure arrays are not identical JSON arrays to compare. That's why give the Table structure JSON array as items and use contains() in the condition. contains() function will give true/false if a given item is in the given list or not. Give the CSV structure array to this function and build a JSON as per this JSON item structure using the current name item().name from the Table structure array.

    By this way, it will check whether all the column names from table structure exists or not in the CSV structure and it satisfies, it will give the same count as below.

    enter image description here

    Later, use If activity. In this expression, check whether columns count from both Get meta data activities are same or not and the counts from Filter activity are same or not.

    @and(equals(activity('Get csv structure').output.columnCount,activity('Get table structure').output.columnCount),equals(activity('Filter1').output.ItemsCount,activity('Filter1').output.FilteredItemsCount))
    

    enter image description here

    Inside True activities of If, you can go ahead with your activities.

    As nested if activities are not supported in ADF, you need to do this column headers check before the if activity and check it in the if activity expression. You can check multiple expressions in the same expression using nested and() dynamic expression.

    @and(and(<expression>,<expression>)>,and(<expression>,<expression>))