azureetlazure-data-factory

How do I pull the last modified file with data flow in azure data factory?


I have files that are uploaded into an onprem folder daily, from there I have a pipeline pulling it to a blob storage container (input), from there I have another pipeline from blob (input) to blob (output), here is were the dataflow is, between those two blobs. Finally, I have output linked to sql. However, I want the blob to blob pipeline to pull only the file that was uploaded that day and run through the dataflow. The way I have it setup, every time the pipeline runs, it doubles my files. I've attached images below

[![Blob to Blob Pipeline][1]][1]

Please let me know if there is anything else that would make this more clear [1]: https://i.sstatic.net/24Uky.png


Solution

  • I want the blob to blob pipeline to pull only the file that was uploaded that day and run through the dataflow.

    To achieve above scenario, you can use Filter by last Modified date by passing the dynamic content as below:

    enter image description here

    Input and Output of Get metadata activity: (Its filtering file for that day only)

    enter image description here

    If the files are multiple for particular day, then you have to use for each activity and pass the output of Get metadata activity to foreach activity as

    @activity('Get Metadata1').output.childItems
    

    enter image description here

    Then add Dataflow activity in Foreach and create source dataset with filename parameter

    enter image description here

    Give filename parameter which is created as dynamic value in filename enter image description here

    And then pass source parameter filename as @item().name enter image description here

    It will run dataflow for each file get metadata is returning.