azure-synapseazure-synapse-pipeline

Azure Synapse Pipeline Get Metadata Filter by last modified to use date in filename instead of last modified date of file


Within the pipeline's Get Metadata1 activity, I would like to filter the files to be copied by the date inside the filename instead of the date of the file:

example: 20230301_FC_ServiceUsage_Common_Electricity.txt.gz Date Modified on File: 3-02-3023

I want to use the 20230301 date within the filename instead of the date modified on the file itself to filter the files selected for copy.

Get Metadata1 passes Childitems to a ForEach loop.

Get Metadata: Filter by last modified

The date range is entered in a parameter: Get Metadata: BeginRange and EndRange Parameters

Is it possible to do this within the "Filter by last modified" Get Metadata1 or do I need to do this by some other mechanism?

I've tried a few things that have not worked.


Solution

  • You can achieve your requirement using Filter activity as well.

    Use the filter activity to filter the file names array from Get Meta data activity.

    Here, these are my file names array from Get Meta data activity.

    enter image description here

    I took the below start and end date.

    enter image description here

    Then I have converted the start and end dates to integer and saved to integer variables using the below expressions.

    start date integer variable:

    @int(concat(split(pipeline().parameters.start_date,'-')[2],split(pipeline().parameters.start_date,'-')[1],split(pipeline().parameters.start_date,'-')[0]))
    

    End date integer variable:

    @int(concat(split(pipeline().parameters.end_date,'-')[2],split(pipeline().parameters.end_date,'-')[1],split(pipeline().parameters.end_date,'-')[0]))
    

    Now, take a filter activity and give Get meta data child items array in the items of filter activity.

    In the condition give the below expression.

    @and(greaterOrEquals(variables('end_date_int'), int(split(item().name,'_')[0])),greaterOrEquals(int(split(item().name,'_')[0]), variables('start_date_int')))
    

    enter image description here

    Filter activity output:

    enter image description here

    NOTE: For the comparison in the filter activity expression, the dates start date, end date and date from file name should be in yyyyMMdd format and should be converted to integer.

    So, if your file name is not in yyyyMMdd format, use substring() and concat after splitting to get in yyyyMMdd format.