I originally had files in my silver folder. I was using notebooks to make transformations to these files and then overwriting the original files with the transformed ones. After speaking to a Synapse engineer, they recommended that it's best practice to refer to folders for the transformations instead of the files themselves. They suggested creating individual folders for each file inside the silver folder.
Currently, when I run my bronze to silver loops, which copy data from my bronze layer (in Avro format) to my silver layer (in Parquet format), it copies the data as files rather than folders. I want each file to be copied into its own individual folder with the corresponding name as the file. Example:
Bronze Layer:
file1.avro
file2.avro
Desired Silver Layer Structure:
silver/
file1/
file1.parquet
file2/
file2.parquet
Current Silver Layer Structure:
silver/
file1.parquet
file2.parquet
This is My current bronze to silver loop. The Get metadata activity Obtains the file names from the bronze layer, allows the for each loop to fetch all the files from the bronze layer and output it into the silver layer.
I've tried to use notebooks to create folders dynamically before before copying the files but I couldn't figure it out.
Is there anyway I can adjust this pipeline or is there any other methods I can use o that each file from the bronze layer is copied into its own folder in the silver layer. Thanks!
In order to get the file name itself as folder name in the sink, you need to give the same expression @item().name
for folder name also as given in file name. Since your file name has <filename>.parquet
and your folder name should not have . parquet
, you can try the below expression for folder name.
@split(item().name,'.')[0]
This expression splits the filename value into array of values. Values are divided by the dot symbol in the file name. Then to get only the file name, zeroth index value is taken.