azure-synapseazure-synapse-analyticsazure-notebooksazure-synapse-pipeline

How to Copy Files into Individual Folders in Synapse Analytics


I originally had files in my silver folder. I was using notebooks to make transformations to these files and then overwriting the original files with the transformed ones. After speaking to a Synapse engineer, they recommended that it's best practice to refer to folders for the transformations instead of the files themselves. They suggested creating individual folders for each file inside the silver folder.

Currently, when I run my bronze to silver loops, which copy data from my bronze layer (in Avro format) to my silver layer (in Parquet format), it copies the data as files rather than folders. I want each file to be copied into its own individual folder with the corresponding name as the file. Example:

Bronze Layer:

file1.avro

file2.avro

Desired Silver Layer Structure:

silver/

file1/

file1.parquet

file2/

file2.parquet

Current Silver Layer Structure:

silver/

file1.parquet

file2.parquet

enter image description here

This is My current bronze to silver loop. The Get metadata activity Obtains the file names from the bronze layer, allows the for each loop to fetch all the files from the bronze layer and output it into the silver layer.

I've tried to use notebooks to create folders dynamically before before copying the files but I couldn't figure it out.

Is there anyway I can adjust this pipeline or is there any other methods I can use o that each file from the bronze layer is copied into its own folder in the silver layer. Thanks!


Solution

  • In order to get the file name itself as folder name in the sink, you need to give the same expression @item().name for folder name also as given in file name. Since your file name has <filename>.parquet and your folder name should not have . parquet, you can try the below expression for folder name.

    @split(item().name,'.')[0]

    This expression splits the filename value into array of values. Values are divided by the dot symbol in the file name. Then to get only the file name, zeroth index value is taken.