[SOLVED] SQL EXCEPT performance

SQL EXCEPT performance

I'm trying to use a query similar to the following query to find differences between two tables (the same table in a DEV database vs a TEST database). Each table has ~30K rows and ~5 columns.

select field1,field2,field3,field4,field5 from dev.dbo.table1 
where field1+field2 in ('string1','string2','string3',...,'string50')
except
select field1,field2,field3,field4,field5 from test.dbo.table1 
where field1+field2 in ('string1','string2','string3',...,'string50')

field1 is char(5) and field2 is char(1)

This query essentially never terminates.

When i analyze this query using SET SHOWPLAN_ALL ON, I can see there is a nested loop pretty high in the tree. When I change the above query to

select * from dev.dbo.table1 
except
select * from test.dbo.table2

The query runs quickly and there is no nested loop in the execution plan.

Can someone help explain this? I don't understand why there would be a drastic difference.

Solution

My best guess is that the optimizer is doing a poor job of estimating the cardinality (size) of the two tables. Because it underestimates the size, it is generating a poor query plan.

In SQL Server, you can use join hints on except. So, you can get the query you want with:

select field1,field2,field3,field4,field5 from dev.dbo.table1 
where field1+field2 in ('string1','string2','string3',...,'string50')
except
select field1,field2,field3,field4,field5 from test.dbo.table1 
where field1+field2 in ('string1','string2','string3',...,'string50')
option (hash join, merge join)

This eliminates the option of the nested loop join, choosing a more favorable method.