azureazure-data-factorywildcard-mapping

ADF Copy Activity problem with wildcard path


I have a seemingly simple task to integrate multiple json files that are residing in a data lake gen2 The problem is files that need to be integrated are located in multiple folders, for example this is a typical structure that I am dealing with: Folder1\Folder2\Folder3\Folder4\Folder5\2022\Month\Day\Hour\Minute\ <---1 file in Minute Folder Than same structure for 20223 year, so in order for me to collect all the files I have to go to bottom of the structure which is Minute folder, if I use wildcard path it looks like this: Wildcard paths 'source from dataset"/ *.json, it copies everything including all folders, and I just want files, I tried to narrow it down and copies only first for 2022 but whatever I do is not working in terms of wildcard paths, help is much appreciated

trying different wildcard combinations did not help, obviously I am doing something wrong


Solution

  • There is no option to copy files from multiple sub- folders to single destination folder. Flatten hierarchy as a copy behavior also will have autogenerated file names in target.

    enter image description here image reference MS document on copy behaviour

    Instead, you can follow the below approach.

    Give the Base URL in HTTP connector as, https://<storage_account_name>.blob.core.windows.net/<container>?restype=directory&comp=list.

    [Replace <storage account name> and <container> with the appropriate name in the above URL]

    enter image description here

    enter image description here

    Settings of filter activity:

    items: @activity('Lookup1').output.value[0].EnumerationResults.Blobs.Blob

    condition: @endswith(item().URL,'.json')

    Output of filter activity enter image description here

    enter image description here

    This expression will take the filename from relative URL value. enter image description here