I am using ADF pipeline to copy multiple datasets parquet files present in multiple folders from one source blob container (Client container) to our blob container.
However, I don't want to copy it directly as it is. Instead, I want it like below,
As shown above I want following folders along with the files present in them to be copied to the shown destination folders. I tried with regular copy activity but it doesn't allow me to add multiple wildcards and also doesn't allow me to specify multiple destination in Sink section.
Could you please let me know how can I achieve the above using the ADF pipeline?
Lookup output array for reference.
As you have source and destination folder paths in a file, you can use a copy activity with dataset parameters in a For-Each activity.
First store the above file in a temporary location.
I took the folder paths same as yours.
SourceBlob_Container,DestinationBlob_Container
blobsource/demo/raad/indicated/inbound/IMSONE_CHANEL_M_AB_202402_20240415,blobdestination/Demo_Tables/Load_04_15_2024
blobsource/demo/raad/indicated/inbound/IMSONE_CONTROL_M_AB_202402_20240415,blobdestination/Demo_Tables/Load_04_15_2025
blobsource/demo/raad/indicated/inbound/IMSONE_COMPATIENT_M_665_202402_20240415,blobdestination/M_665/ Load_04_15_2024
blobsource/demo/raad/indicated/inbound/IMSONE_DIAG_M_665_202402_20240415,blobdestination/M_665/Load_04_15_2025
blobsource/demo/raad/indicated/inbound/IMSONE_COMPATIENT_M_667_202402_20240415,blobdestination/M_667/Load_04_15_2024
blobsource/demo/raad/indicated/inbound/IMSONE_DIAG_M_667_202402_20240415,blobdestination/M_667/Load_04_15_2025
Create a delimited text dataset to this file and give that to a lookup activity with below configurations.
Lookup activity will give the Source and sink folder paths as a JSON array. Take a For-Each activity and give this array @activity('Lookup1').output.value
to the For-Each expression.
Inside For-Each, take the copy activity. For the source of the copy activity, take a parquet dataset and create a dataset parameter container_name
.
Use this as @dataset().container_name
in the dataset container name and leave the remaining path as empty.
For the sink dataset of the copy activity, create another parquet dataset. Here, create two dataset parameters container_name
, folder_path
.
Use those in the dataset as same as source dataset and leave the file name as empty.
Give these two datasets as source and sink of the copy activity. Now, in the source, select Wild card file path and give the below expressions.
container_name : @first(split(item().SourceBlob_Container,'/'))
wild card folder name : @join(skip(split(item().SourceBlob_Container,'/'),1),'/')
wild card path : *.parquet
Similarly, give the below expressions for the sink dataset parameters.
container_name : @first(split(item().DestinationBlob_Container,'/'))
folder_path : @join(skip(split(item().DestinationBlob_Container,'/'),1),'/')
Now, debug the pipeline and all the parquet files in each source folder path will be copied to its respective target folder in each iteration.
UPDATE:
To copy the source folder of the files to the target location, give the below expressions in the copy activity sink.
container_name : @first(split(item().DestinationBlob_Container,'/'))
folder_path : @concat(join(skip(split(item().DestinationBlob_Container,'/'),1),'/'),'/',last(split(item().SourceBlob_Container,'/')))
Keep the copy activity source as same as above.
It will copy all file in the folder along with folder to the target location.