azure-data-factoryazure-blob-storagelast-modified

Get last modified of nested file


I am trying to get the last modified date of files in nested folders and only copy the files that have been modified in the last 10 days.

Folder structure:

container: home

product_data/

|---------20221020/

|---------|---------20221020/

|---------|---------|---------product_data_20221020_20221020.parquet

|---------20221021/

|---------|---------20221021/

|---------|---------|---------product_data_20221021_20221021.parquet

|---------20231102/

|---------|---------20231102/

|---------|---------|---------product_data_20231102_20231102.parquet

the 20231102 parquet file is the only one that should be copied because this file was last modified on Nov-7 (The last modified date does not match the date of the file).

I've messed with a similar issue before: Get Last Modified Date on Partitioned Data Using Azure Data Factory

My current issue is that I can't filter the files at all.

Image 1: Pipeline Overview pipeline overview

Image 2: Get Metadata Config get metadata config

Image 3: Parent Dataset (root folder) parent dataset (root folder)

Image 4: For Loop for loop

Image 5: Get Metadata inside the for loop Get file metadata inside the for loop

Image 6: Dataset for Get Metadata inside the for loop enter image description here

Image 7: Parameters for Dataset for Get Metadata inside the for loop enter image description here

Image 8: Get Metadata output inside the for loop

enter image description here

Because the "Filter by last modified" on the Get Metadata inside the foor loop doesn't seem to work, I also tried adding a filter and tried setting the variable (to debut), but both fail.

Filter Config items: @activity('Get Files Metadata').output.itemName Condition: @greater(activity('Get Files Metadata').output.lastModified, addMinutes(utcNow(), -30))

Image 9: Filter Output (Ignore the itemscount) enter image description here

Image 10: Filter Error

enter image description here

Image 11: Set Variable Config enter image description here

Image 12: Set Variable Error

enter image description here


Solution

  • To get the last modified date of files in nested folders you need to use get metadata activity and ForLoop with appropriate parameters.

    This will Give you array of Files from the folder which are modified in last 10 days.