I am trying to get the last modified date of files in nested folders and only copy the files that have been modified in the last 10 days.
Folder structure:
container: home
product_data/
|---------20221020/
|---------|---------20221020/
|---------|---------|---------product_data_20221020_20221020.parquet
|---------20221021/
|---------|---------20221021/
|---------|---------|---------product_data_20221021_20221021.parquet
|---------20231102/
|---------|---------20231102/
|---------|---------|---------product_data_20231102_20231102.parquet
the 20231102 parquet file is the only one that should be copied because this file was last modified on Nov-7 (The last modified date does not match the date of the file).
I've messed with a similar issue before: Get Last Modified Date on Partitioned Data Using Azure Data Factory
My current issue is that I can't filter the files at all.
Image 3: Parent Dataset (root folder)
Image 5: Get Metadata inside the for loop
Image 6: Dataset for Get Metadata inside the for loop
Image 7: Parameters for Dataset for Get Metadata inside the for loop
Image 8: Get Metadata output inside the for loop
Because the "Filter by last modified" on the Get Metadata inside the foor loop doesn't seem to work, I also tried adding a filter and tried setting the variable (to debut), but both fail.
Filter Config
items: @activity('Get Files Metadata').output.itemName
Condition: @greater(activity('Get Files Metadata').output.lastModified, addMinutes(utcNow(), -30))
Image 9: Filter Output (Ignore the itemscount)
Image 10: Filter Error
Image 12: Set Variable Error
To get the last modified date of files in nested folders you need to use get metadata activity and ForLoop with appropriate parameters.
Filter by last modified
parameter where Start time is @getPastTime(10,'Day')
and End time is utcNow()
.
Dataset for it:
This will Give you array of Files from the folder which are modified in last 10 days.