I am using Azure Data Factory's data flow for my development.
I have a source table with 'D', 'I', and 'U' rows, I want to split the 'D' rows from 'U' and 'I' and load them separately into two different folders.
I used an Azure data flow conditional split to split these and pointed the streams to different streams and sinks respectively, Delete Sink for 'D' rows and InsertUpdate Sink for 'I' and 'U' rows.
My pipeline is dynamic, I haven't imported any schema, and I don't want to, and all the columns drift through the dataflow from source to sink. The conditional split is also done using the byName() function inside the expression builder.
My problem is, for example, if there are no 'D' rows, no columns will drift to the Delete sink, this throws an error stating 'a sink cannot have 0 columns.
Kindly help me solve this issue, if possible a step-by-step explanation text/visual to resolve my problem.
My problem is, for example, if there are no 'D' rows, no columns will drift to the Delete sink, this throws an error stating 'a sink cannot have 0 columns.
The Conditional split transformation won't work in that way. It will only split the rows based on the condition and if there are no rows that satisfying the given condition, it will produce empty rows with same source schema. The same columns from the source will be passed to the target but it won't have any rows.
Follow the below step by step procedure to achieve your requirement.
Here this is my sample source data:
one,two
I,2
U,3
I,5
I,6
U,7
U,8
I have used conditional split on the source with below conditions. Here, the schema was not projected in the source as same as yours.
InsertUpdate - or(equals(toString(byName('one')),'I'),equals(toString(byName('one')),'U'))
Delete - equals(toString(byName('one')),'D')
I have added respective sinks to the 2 streams.
It will generate files like below when executed from the pipeline.
InsertUpdate.csv file:
When the source doesn't have any D
rows, it will create the delete.csv
file with source headers (If firstRow as header checked in sink dataset) and no rows like below.
If first row as header is disabled in the sink dataset, it will generate an empty delete.csv
file.
Make sure you have checked the Schema drift
in the source and as well as in both sinks.