sqlsql-servertemp-tablessqlperformancetable-variable

SQL query with Table variable/Temp Table - which one good in performance


SQL Questions: Is this insert query is good in performance ?


UserIds List is variable, its length mybe 10, 20 ... till 500.

WITH Users
  AS (SELECT [Id] from [electro].[User] Where [Id] IN (4438,15473,22497,22494,4425,4426,22496))
  INSERT INTO [electro].[ElectronicCorrespondenceInbox] ([UserId], [ElectronicCorrespondenceId], [CreatedAt])
  SELECT [User].[Id], [Corrs].[Id], GETDATE()
FROM [electro].[ElectronicCorrespondence] AS [Corrs], [Users] AS [User]
WHERE [Corrs].[Published] = 0;

Or


Creating temp variable contains users ids is good idea ?

Option 2# with Table variable:

DECLARE @USERS TABLE
([UserId] INT NOT NULL);

INSERT INTO @USERS
VALUES (2350), (4438), (15473), (22497), (22494), (4425), (4426), (22496);

INSERT INTO [electro].[ElectronicCorrespondenceInbox] ([UserId], [ElectronicCorrespondenceId], [CreatedAt])
SELECT [User].[UserId], [Corrs].[Id], GETDATE()
FROM [electro].[ElectronicCorrespondence] AS [Corrs], @USERS AS [User]
WHERE [Corrs].[Published] = 0;

Solution

  • First off, stop using old-style join syntax - it's inprecise and error-prone; use modern explicite join syntax, see Bad join habbits.

    Using a temp table in this scenario is unecessary and in most cases not desirable where an unknown number of rows are likely. SQL 2019 has something called deferred table variable compilation which can improve table-variable performance for ad-hoc queries, but is still problematic for cached execution plans.

    This is because when you use a table variable, especially prior to SQL2019, SQL Server treats it as a variable in every sense, in that a variable can store a single value, and SQL Server assumes only 1 row will be present, whether it actually contains 1 row or 1 million rows.

    This can lead to poor execution plans when joining to a table variable resulting in unsuitable join operations, table scan operations and underestimated memory grants resulting in tempdb usage, because it's always assumed to be a lightweight operation.

    Your query is probably better written as follows, since you are introducing a cartesian product into your results it's always best to be explicit with the correct join syntax. This also assumes that you need to correlate the list of ID values with those that actually exist in the Users table, as your first query implies

    insert into electro.ElectronicCorrespondenceInbox (UserId, ElectronicCorrespondenceId, CreatedAt)
    select u.Id, ec.Id, GetDate()
    from electro.ElectronicCorrespondence ec
    cross join electro.Users u 
    where ec.Published = 0
    and u.id in (4438,15473,22497,22494,4425,4426,22496)
    

    If you don't actually need this, as your second query implies, then you can simply include the list of values directly in your CTE without touching your User table

     with users as (
        select id from (values(4438),(15473),(22497),(22494),(4425),(4426),(22496))v(Id)
    )
    

    Also finally, if this is intended to be part of a procedure to which you pass a variable number of ID values, eg using a table-type parameter, you will likely find the best performance will result from inserting the values from your passed-in table-type parameter into a proper #temporary table and using that in your query.