jsonazureazure-data-factory

azure data factory: how to merge all files of a folder into one file


I need to create a big file, by merging multiple files scattered in several subfolders contained in an Azure Blob Storage, also a transformation needs to be done, each file contains a JSON array of a single element, so the final file, will contain an array of JSON elements.

The final purpose is to process that Big file in a Hadoop & MapReduce job.

The layout of the original files is similar to this:

folder
 - month-01
   - day-01
        - files...

- month-02
    - day-02
        - files...

Solution

  • I did a test based on your descriptions,please follow my steps.

    My simulate data:

    test1.json resides in the folder: date/day1

    enter image description here

    test2.json resides in the folder: date/day2

    enter image description here

    Source DataSet,set the file format setting as Array of Objects and file path as root path.

    enter image description here

    Sink DataSet,set the file format setting as Array of Objects and file path as the file you want to store the final data.

    enter image description here

    Create Copy Activity and set the Copy behavior as Merge Files.

    enter image description here

    Execution result:

    enter image description here

    The destination of my test is still Azure Blob Storage, you could refer to this link to learn about Hadoop supports Azure Blob Storage.