I'm working on a pipeline in which I get the file names from some file I need to process from a SFTP using the get metadata activity, then filter the file names get just the files I need, after that I need to process each of those files into an Azure SQL table, I'm trying to use the For each activity and when I tried using the Copy activity to process the files it worked fine, but I also need two additional columns to be inserted into the azure SQL table, the date time of processing and the file name.
First of all, when I use the Data Flow, it's not detecting any of the columns in the file (https://i.sstatic.net/6bR0T.png) and when I try to map the columns in the sink activity, I only get the derived columns I created. I also tried using the Copy Activity but I don't really know how to add derived columns there.
Does anyone know if this sink/transformation can be accomplished using the Copy activity or the Data Flow Activity?
First of all, when I use the Data Flow, it's not detecting any of the columns in the file
Based on the structure of your pipeline, inside ForEach activity, you are passing every file name to the Dataflow using dataset parameters.
As you are doing the transformation dynamically for all the files in each iteration, your source schema was set to None while creating the dataset.
In general, the dataset schema for particular file will be imported from the connection/store
while creating the dataset and it will take that schema in the dataflow source as well.
when I try to map the columns in the sink activity, I only get the derived columns I created
You will get like this only in the dataflow debug. When you execute the dataflow using pipeline, it will give desired result dynamically as per the source file schema in each iteration.
In the below demonstration, I took one file with import schema
set to None
in the dataset and the source rows in the dataflow debug are 0
now.
You can see I am adding one new column and in sink it is showing only one column.
But when I execute the dataflow using pipeline, it took the schema dynamically from the file and you can see the result included all the existing columns from the source file.
So, the above behavior won't affect your output file. If you want to cross check the original data while doing the transformation, you need to give your source file once in the dataset and import schema in the dataset. After importing schema, use your dataset parameters in the file name to do this in each iteration.
From this, the dataflow will take it and you can see the data preview while doing the transformations.