I have a list of about 200,000 records with EntityID column which I load into a temp table variable.
I want to insert any records from the Temp table variable if EntityID from the Temp table does not exist in the dbo.EntityRows table. The dbo.EntityRows table contains about 800,000 records.
The process is very slow compared to when the dbo.EntityRows table had about 500,000 records.
My first guess is because of the NOT EXISTS clause, each row from the Temp variable must scan the entire 800k rows of the dbo.EntityRows table to determine if it exists or not.
QUESTION: Are there alternative ways to run this comparison check without using the NOT EXISTS, which incurs a hefty cost and will only get worse as dbo.EntityRows continues to grow?
EDIT: Appreciate the comments. Here is the query (I left out the part after the IF NOT EXISTS check. After that, if NOT EXISTS, I insert into 4 tables).
declare @EntityCount int, @Counter int, @ExistsCounter int, @AddedCounter int
declare @LogID int
declare @YdataInsertedEntityID int, @YdataSearchParametersID int
declare @CurrentEntityID int
declare @CurrentName nvarchar(80)
declare @CurrentSearchParametersID int, @CurrentSearchParametersIDAlreadyDone int
declare @Entities table
(
Id int identity,
EntityID int,
NameID nvarchar(80),
SearchParametersID int
)
insert into @Entities
select EntityID, NameID, SearchParametersID from YdataArvixe.dbo.Entity order by entityid;
set @EntityCount = (select count(*) from @Entities);
set @Counter = 1;
set @LogID = null;
set @ExistsCounter = 0;
set @AddedCounter = 0;
set @CurrentSearchParametersIDAlreadyDone = -1;
While (@EntityCount >= @Counter)
begin
set @CurrentEntityID = (select EntityID from @Entities
where id = @Counter)
set @CurrentName = (select nameid from @Entities
where id = @Counter);
set @CurrentSearchParametersID = (select SearchParametersID from @Entities
where id = @Counter)
if not exists (select 1 from ydata.dbo.entity
where NameID = @CurrentName)
begin
-- I insert into 4 tables IF NOT EXISTS = true
end
Well, the answer was pretty basic. @Felix and @TT had the right suggestion. Thanks!
I put a non-clustered index on the NameID field in ydata.dbo.entity.
if not exists (select 1 from ydata.dbo.entity
where NameID = @CurrentName)
So it can now process the NOT EXISTS part quickly using the index instead of scanning the entire dbo.entity table. It is moving fast again.