sql-serverssissql-server-2012sql-server-2012-datatools

How to add an extra column to handle error message in a SSIS data flow?


I'm working on a SSIS project in order to import calls rows (Excel file) in a SQL Server database.

Here is my data flow :

enter image description here

I added some lookups to check rows before import process. The first one checks if the row already exists (made to prevent duplicates because the user drag & drop import files in a specified folder). Then the others lookups check foreign keys constraints. Moreover, all no matching rows are redirected to another database. So I'm able to check invalid rows, then an audit package let me know if my NoMatchingRowsCall table changed during inport.

Now, I would like to add an "Error Message" to no matching rows to check "what is the problem about this row ?". I think to add a "derived column after each lookup (no matching output) to add the error message. What about that way ? How to add a text content in a "derived column" ? Should I use a package variable ?

Here is that I would like to get :

ID | C1 | C2 | C3 | ERROR_MESSAGE
1  | .. | .. | .. | Row already exists
2  | .. | .. | .. | FK error for column C1
3  | .. | .. | .. | FK error for column C2
...

I want a "soft" solution to track failing rows without stop package execution, and to be able to manually insert the failing row if needed by changing failing keys.


Solution

  • Adding a derived Error_Message column against each No Match output will give you what you are looking for. In your current design, you can just type in the Error Message against each derived column as there will be one derived column component per stream. There is no need to add variables unless you want to reuse the values else where or to have all messages in a centralised place

    A couple of warnings though...

    1. Lookups are inherently expensive as they perform row by row queries. If you are only dealing with a small number of rows / small table this may be fine, but if you are looking at millions of rows, you will soon hit a bottle neck. One way around this would be to temporarily put you data into the database and perform entire set checks (e.g. select all x that don't have a related y row using left join). This way, you are letting sql perform the leg work in a batch which will be quicker
    2. Your current design will only highlight the first issue. Once you've resolved the "no employee issue", the same row may have a date mismatch. Ideally you want to validate all rows for all issues (with the exception of duplicate rows) so that you have a complete set of issues to resolve. If you decide to switch over to using SQL to validate, maybe you could add a bit flag column per issue type or a single column wise a bitwise flag covering all issues