I have files that are uploaded into an onprem folder daily, from there I have a pipeline pulling it to a blob storage container (input), from there I have another pipeline from blob (input) to blob (output), here is were the dataflow is, between those two blobs. Finally, I have output linked to sql. However, I want the blob to blob pipeline to pull only the file that was uploaded that day and run through the dataflow. The way I have it setup, every time the pipeline runs, it doubles my files. I've attached images below
[![Blob to Blob Pipeline][1]][1]
Please let me know if there is anything else that would make this more clear [1]: https://i.sstatic.net/24Uky.png
I want the blob to blob pipeline to pull only the file that was uploaded that day and run through the dataflow.
To achieve above scenario, you can use Filter by last Modified date
by passing the dynamic content as below:
@startOfDay(utcnow())
: It will take start of the day for the current timestamp.@utcnow()
: It will take current timestamp.Input and Output of Get metadata activity: (Its filtering file for that day only)
If the files are multiple for particular day, then you have to use for each activity and pass the output of Get metadata activity to foreach activity as
@activity('Get Metadata1').output.childItems
Then add Dataflow activity in Foreach and create source dataset with filename parameter
Give filename parameter which is created as dynamic value in filename
And then pass source parameter filename as @item().name
It will run dataflow for each file get metadata is returning.