sql-serverssisssis-2016

SSIS Lookup Transform use Table or Query


I have a Lookup Transformation on a table with 30 columns but I only am using two columns: ID column for the join and Update column as Output.

On the connection should I enter a query Select ID, Update From T1 or Use Table in the drop down?

Using table in Drop down would this be like doing Select * From T1 or is SSIS clever enough to know I only need 2 columns.

I'm thinking I should go with the Query Select ID, Update From T1.


Solution

  • On the connection should I enter a query Select ID, Update From T1 or Use Table in the drop down?

    It is best to specify which columns you want.

    Using table in Drop down, would this be like doing Select * From T1

    Yes, it is a SELECT *.

    or is SSIS clever enough to know I only need 2 columns?

    Nope.

    Keep in mind that Lookups are good for pulling data from Dimension Tables where the row count and record set is small. If you are dealing with large amounts of unique data, then it will be better to perform a MERGE JOIN, instead. The performance difference can be substantial. For example, when using a Lookup on 20K rows of data, you could experience run times in the tens of minutes. A MERGE JOIN, however, would run within seconds.

    Lookups have the drawback of behaving like correlated sub-queries in that they fire off a query to the server for every row passing through it. You can have the Lookup cache the data, which means SSIS will store the results in memory and then check the memory before going to the server for all subsequent rows passing through the Lookup. As a result, this is only effective if there are a large number of matching records for a small cache set. In other words, Lookups are not optimal when there is large amount of Distinct ID's to lookup. To that point, caching data is almost pointless.

    This is where you would switch over to using a MERGE JOIN. Note: you will need to perform a SORT on both of the data flows before the MERGE JOIN because the MERGE JOIN component requires the incoming rows to be sorted.

    When handled incorrectly, a single poorly placed Lookup can bring an entire package to its knees - lookups can be huge performance bottlenecks. Though, handled correctly, a Lookup can simplify the design of the dataflow and speed development by removing the extra development required to MERGE JOIN data flows.

    The bottom line to all of this is that you want the Lookup performing the fewest number of queries against the server.