ssisazure-data-factoryscript-component

SSIS component to remove linefeeds from all (n)varchar fields in dataflow


I use a SSIS component to retrieve data from an API. The API delivers XML, This XML contains data like this:

<tag>
code
</tag>

Linefeeds before and after 'code'

That means that what is entered into my (n)varchar columns in SQL looks like this: CHAR(10)codeCHAR(10)

That messes up a lot, among other things the way things look in the reports. So the CHAR(10) needs to disappear. It needs to be filtered out between the source component and the destination component in my dataflow. I could use derived columns for this but it concerns 9 dataflows with (in total) 385 (n)varchar fields. A lot of work! Is it possible to use a script component that simply does a replace(field,linefeed,'') on each (n)varchar? So that all data passes through this component and exits stripped of linefeeds?

If so, how do I do this? I am pretty ok with SQL, can read most languages but need some help on writing this in a C# of VB.NET. I am designing this in VS2019 and deploying to ADFv2 IS. Targets are SQL Azure databases.

I cannot simply run some SQL after the SSIS job has run since the next run will do a upsert on the data. It will then conclude that CHAR(10)codeCHAR(10) is different from code and insert a new line, violating the unique constraint on code. So that's why I need to do it after getting the data and before writing the data.


Solution

  • I could not find a solution that would not require extensive coding, at the cost of performance. One solution I tried was writing the XML as a CSV, do a search and replace on the CSV en then import that into a table. Though it worked, it made the solution unnessecary complex and it was detrimental in terms of performance. It was fighting symptoms, not curing the issue. So I went and talked to the guys delivering the API and they removed these linefeeds. That turned out to be the only good solution.