I have a seemingly simple task to integrate multiple json files that are residing in a data lake gen2 The problem is files that need to be integrated are located in multiple folders, for example this is a typical structure that I am dealing with: Folder1\Folder2\Folder3\Folder4\Folder5\2022\Month\Day\Hour\Minute\ <---1 file in Minute Folder Than same structure for 20223 year, so in order for me to collect all the files I have to go to bottom of the structure which is Minute folder, if I use wildcard path it looks like this: Wildcard paths 'source from dataset"/ *.json, it copies everything including all folders, and I just want files, I tried to narrow it down and copies only first for 2022 but whatever I do is not working in terms of wildcard paths, help is much appreciated
trying different wildcard combinations did not help, obviously I am doing something wrong
There is no option to copy files from multiple sub- folders to single destination folder. Flatten hierarchy as a copy behavior also will have autogenerated file names in target.
image reference MS document on copy behaviour
Instead, you can follow the below approach.
Give the Base URL in HTTP connector as,
https://<storage_account_name>.blob.core.windows.net/<container>?restype=directory&comp=list
.
[Replace <storage account name>
and <container>
with the appropriate name in the above URL]
.json
from the lookup activity output.Settings of filter activity:
items:
@activity('Lookup1').output.value[0].EnumerationResults.Blobs.Blob
condition:
@endswith(item().URL,'.json')
Output of filter activity
Take the for-each activity next to filter activity and give the item of for-each as @activity('Filter1').output.value
Inside for-each activity, take the copy activity.
Take http connector and json dataset as source, give the base url as
https://<account-name>.blob.core.windows.net/<container-name>/
Create the parameter for relative URL and value for that parameter as @item().name
@split(item().name,'/')[sub(length(split(item().name,'/')),1)]
This expression will take the filename from relative URL value.