azure-blob-storagefile-storageazure-data-factory

Data Factory | Copy recursively from multiple subfolders into one folder wit same name


Objective: Copy all files from multiple subfolders into one folder with same filenames. E.g.

Source Root Folder
20221110/
  AppID1
    File1.csv
    File2.csv
  /AppID2
     File3.csv
     File4.csv
20221114
   AppID3
     File5.csv
     File6.csv
and so on
Destination Root Folder
    File1.csv
    File2.csv
    File3.csv
    File4.csv
    File5.csv
    File6.csv

Approach 1 Azure Data Factory V2 All datasets selected as binary

  1. GET METADATA - CHILDITEMS
  2. FOR EACH - Childitem
  3. COPY ACTIVITY(RECURSIVE : TRUE, COPY BEHAVIOUR: FLATTEN)

This config renames the files with autogenerated names. If I change the copy behaviour to preserve hierarchy, Both file name and folder structure remains intact.

Approach 2

  1. GET METADATA - CHILDITEMS
  2. FOR EACH - Childitems
  3. Execute PL2 (Pipeline level parameter: @item.name)
  4. Get Metadata2 (Parameterised from dataset, invoked at pipeline level)
  5. For EACH2- Childitems
  6. Copy (Source: FolderName - Pipeline level, File name - ForEach2)

Both approaches not giving the desired output. Any help/Workaround would be appreciated.


Solution

  • If all of your files are in the same directory level, you can try the below approach.

    First use Get Meta data activity to get all files list and then use copy inside ForEach to copy to a target folder.

    These are my source files with directory structure:

    enter image description here

    Source dataset:

    Based on your directory level use the wildcard placeholder(*/*) in the source dataset.

    enter image description here

    The above error is only a warning, and we can ignore it while debug.

    Get meta data activity:

    enter image description here

    This will give all the files list inside subfolders.

    enter image description here

    Give this array to a ForEach activity and inside ForEach use copy activity.

    Copy activity source:

    enter image description here

    In the above also, the */* should be same as we gave in Get Meta data.

    For sink dataset create a dataset parameter and use in the file path of dataset.

    enter image description here

    Copy activity sink:

    enter image description here

    Files copied to target folder:

    enter image description here

    If your source files are not in same directory level then you can try the recursive approach mentioned in this article by @Richard Swinbank.