I'm trying to use a query similar to the following query to find differences between two tables (the same table in a DEV database vs a TEST database). Each table has ~30K rows and ~5 columns.
select field1,field2,field3,field4,field5 from dev.dbo.table1
where field1+field2 in ('string1','string2','string3',...,'string50')
except
select field1,field2,field3,field4,field5 from test.dbo.table1
where field1+field2 in ('string1','string2','string3',...,'string50')
field1 is char(5)
and field2 is char(1)
This query essentially never terminates.
When i analyze this query using SET SHOWPLAN_ALL ON
, I can see there is a nested loop pretty high in the tree. When I change the above query to
select * from dev.dbo.table1
except
select * from test.dbo.table2
The query runs quickly and there is no nested loop in the execution plan.
Can someone help explain this? I don't understand why there would be a drastic difference.
My best guess is that the optimizer is doing a poor job of estimating the cardinality (size) of the two tables. Because it underestimates the size, it is generating a poor query plan.
In SQL Server, you can use join
hints on except
. So, you can get the query you want with:
select field1,field2,field3,field4,field5 from dev.dbo.table1
where field1+field2 in ('string1','string2','string3',...,'string50')
except
select field1,field2,field3,field4,field5 from test.dbo.table1
where field1+field2 in ('string1','string2','string3',...,'string50')
option (hash join, merge join)
This eliminates the option of the nested loop join, choosing a more favorable method.