sqlsql-serverindexingsql-except

Why there is a sort operator in execution plan when EXCEPTed queries are clustered tables?


I'm building a Data Warehouse and I found a problem in two tables data comparison statement. I use EXCEPT operator to compare the tables which have clustered indexes (normal int field as key). My problem is that in query execution plan there are sort operators after both clustered indexes scan. Here's is a code example:

create table temp.table_a
(
    key_a int identity,
    some_field_a int,
    some_field2_a varchar(10)
);

insert into temp.table_a
(
    some_field_a,
    some_field2_a
)
select
    n,
    'abcd'
from meta.GENERATE_SEQUENCE(1,1000);

create clustered index cix_table_a_key_a on temp.table_a (key_a);


create table temp.table_b
(
    key_b int identity,
    some_field_b int,
    some_field2_b varchar(10)
);

insert into temp.table_b
(
    some_field_b,
    some_field2_b
)
select 
    n,
    'abcd'
from meta.GENERATE_SEQUENCE(1,1000);

create clustered index cix_table_b_key_b on temp.table_b (key_b);

(GENERATE_SEQUENCE is a row generator)

Now the EXCEPT query:

select 
    key_a,
    some_field_a,
    some_field2_a
from temp.table_a

except

select 
    key_b,
    some_field_b,
    some_field2_b
from temp.table_b

Here's an image of the execution plan:

[Execution plan]

I'm aware that Merge Join needs sorted input, but isn't it already sorted enough? By this I mean that the only sorted columns we need is key_a/key_b. And this is already done because of clustered indexes. Sorts of other columns are not needed because inside every value of key_a/key_b there is only one row - nothing to sort.

So, my question are:

  1. Why there are sort operators after clustered index scans in this situation?
  2. How can I avoid these sorts when I want to use EXCEPT operator?
  3. What are the better ways (if there is any) of doing table comparison?

Thanks in advance for your answers :)


Solution

  • Thank you all guys for your help. Below is full answer for my question gathered from comments and answers:

    1. Why there are sort operators after clustered index scans in this situation?

    Sort operators are there because index key columns (key_a, key_b) are not unique for optimizer.

    1. How can I avoid these sorts when I want to use EXCEPT operator?

    Be sure that your index key columns are unique - use UNIQUE CLUSTERED INDEX or set constraint on these fields.

    1. What are the better ways (if there is any) of doing table comparison?

    Alternate solutions considering adding more columns to index key or using NOT EXISTS instead of EXCEPT were given in answer of Steve Ford (@steve-ford).