When I did archive job in ssis, I want to compare PK between source and destination, but there are some tables with different PK names, and some of them have composite primary keys. So I tried to use foreach function to get list of columns, but I got stuck on conditional split, how could I set Expression?
Example below: i don't know what should I put into expression of Conditional split. I only want 2 conditions, one is 'Add new' (Check if any of DST_Columns is null) and the other is 'Already Exists' (Check if all DST_Columns == SRC_Column)
<MergeJoin Name="MRGJ - <#=tbl.Name#> RowId" JoinType = 'FullOuterJoin'>
<LeftInputPath OutputPathName="OLESRC - Archive_PKList_<#=tbl.Name#>.Output">
<Columns>
<#
var indexSortKey = 1;
foreach(var column in tbl.Columns.Where(c => c.IsUsedInPrimaryKey)) {
#>
<Column SourceColumn="<#=column.Name#>" SortKeyPosition="<#=indexSortKey.ToString()#>" TargetColumn="DST_<#=column.Name#>" />
<#
indexSortKey++;
} #>
</Columns>
</LeftInputPath>
<RightInputPath OutputPathName="OLESRC - Source <#=tbl.Name#>.Output">
<Columns>
<#
indexSortKey = 1;
foreach(var column in tbl.Columns.Where(c => c.IsUsedInPrimaryKey)) {
#>
<Column SourceColumn="<#=column.Name#>" SortKeyPosition="<#=indexSortKey.ToString()#>" TargetColumn="SRC_<#=column.Name#>" />
<#
indexSortKey++;
} #>
</Columns>
</RightInputPath>
<JoinKeys>
<#foreach(var column in tbl.Columns.Where(c => c.IsUsedInPrimaryKey)) { #>
<JoinKey LeftColumn="DST_<#=column.Name#>" RightColumn="SRC_<#=column.Name#>"></JoinKey>
<# } #>
</JoinKeys>
</MergeJoin>
<ConditionalSplit Name="Conditional_Split">
<OutputPaths>
<OutputPath Name="New_Records" IsErrorOutput="false">
<Expression>
<#foreach(var column in tbl.Columns.Where(c => c.IsUsedInPrimaryKey)) { #>
ISNULL (DST_<#=column.Name#>) .... I don''t know
</Expression>
</OutputPath>
<OutputPath Name="Already_Exists" IsErrorOutput="false">
<Expression> <#foreach(var column in tbl.Columns.Where(c => c.IsUsedInPrimaryKey)) { #>
SRC_<#=column.Name#> == DST_<#=column.Name#> --- not sure either
<# } #>
</Expression>
</OutputPath>
</OutputPaths>
</ConditionalSplit>
I searched online but Biml document is limited. And I'm not very familiar with c# either, so if you know some good source please also let me know, thank you!
You don't have a ConditionalSplit problem per se, you have a logic problem. You have a merge join, in full outer join
mode and the result of this is that you'll have columns preface with SRC and DST. Based on whether the column is IsUsedInPrimaryKey
then you either use a join key or not.
Coming out of this MergeJoin, either the SRC has a match in the DST or it doesn't (as shown by SRC_ColumnName(s) being NULL). You can probably simplify this by checking just the first column that is IsUsedInPrimaryKey
unless you have a nullable key in which this goes into a consulting project ;)
If there's a match, only then do we want to compare each SRC column to the mate in the DST column. And realistically, we can exclude the IsUsedInPrimaryKey as we know they will match because that's how we got here.
As I often encourage people when working with SSIS, don't do too much in a place where it's hard to debug. In this case, I would add 2 Derived Columns between the MergeJoin and your ConditionalSplit because it's going to be much, much easier to debug the intended expressions.
We're going to add a column into the data flow called IsNew as this will identify whether the SRC row has a match in the DST.
The goal here is to build a string that is simply
ISNULL(DST_PrimaryKeyCol1)
<#
// This is all approximate code, actual syntax and values may vary
string first_keyColumn = tbl.Columns.Where(c => c.IsUsedInPrimaryKey)).FirstOrDefault();
string der_IsNew = "ISNULL(DST_{0})".format(first_keyColumn);
#>
<DerivedColumns Name="DER IsNew">
<Columns>
<Column DataType="Boolean" Name="IsNew"><#=der_IsNew#></Column>
</Columns>
</DerivedColumns>
This is going to be the harder one, simply because you may run into a length limitation of the SSIS Expression language of 8000 characters (citation needed).
The goal of this step is to build out a string that uses the following pattern
(SRC_Column0 != DST_Column0) || (SRC_ColumnN != DST_ColumnN)
Where the double pipes ||
are logical ORs because we want this new column to be True when any of the columns do not match.
<#
string testTemplate = "(SRC_{0} != DST{0})";
// This is all approximate code, actual syntax and values may vary
List<string> non_keyColumns = tbl.Columns.Where(c => !c.IsUsedInPrimaryKey)).Select(c => testTemplate.Format(c.Name)).ToList<string>();
// At this point, we have a list of just the column names in test format
// Use the .Join operator to concat/aggregate the rows into one big one
string der_IsChanged = "||".join(non_keyColumns);
#>
<DerivedColumns Name="DER IsChanged">
<Columns>
<Column DataType="Boolean" Name="IsChanged"><#=der_IsChanged#></Column>
</Columns>
</DerivedColumns>
So now, your conditional split is simplified. You'll have two custom outputs
<ConditionalSplit Name="Conditional_Split">
<OutputPaths>
<OutputPath Name="New_Records" IsErrorOutput="false">
<Expression>IsNew</Expression>
</OutputPath>
<OutputPath Name="Already_Exists" IsErrorOutput="false">
<Expression>IsChanged</Expression>
</OutputPath>
</OutputPaths>
</ConditionalSplit>
Documentation: