I am trying to copy csv files in blob storage to snowflake. first row as header and want to skip 2nd line in csv while using copy activity in azure data factory.
csv-file-rows:
ID,NAME,AGE,INCOME
'','','',0
123,'test',23,1045
expected: first row as header, skip second row, insert remaining rows
Error code:
DELIMITEDTEXTCOLUMNNAMENOTALLOW NULL,EXCEPTION,Message=The name of column index 1 is empty
If you know the column names before the pipeline run, you can try the below approach using copy activity.
For sample, I took the input csv data like below.
ID,NAME,AGE,INCOME
'','','',0
123,test,23,1045
124,test2,24,1046
First uncheck the First row as header in the source csv dataset.
Now, give this to the copy activity source and give 2
in the Skip line count option.
Give your target dataset as copy activity sink and import the schemas in the mapping.
Here, give your required column names in the order.
Now, run the copy activity and the 2nd line will be skipped in the target. For sample, I took the target as csv. In your case, it should be snowflake.
If you want to do this process dynamically without giving column names manually, you can use Dataflow. Give your input dataset as source in the dataflow. Make sure the First row as header of the dataset is checked in this case.
Now, add a Surrogate Key transformation and add a column key
with incrementing the value as 1
.
Then, add a filter transformation with condition to skip the first row key!=1
.
Add your snowflake dataset as sink here and execute the dataflow from the pipeline. It will give the desired results in the target.