sqlazure-sql-database

Delete query not working on Azure SQL Database


We have a query that creates a temporary table to store IDs for deletion and then deletes the corresponding records from a table in an Azure SQL database using transactions. The database is part of an Azure SQL Elastic Pool. While this query worked perfectly on one Azure SQL server, it did not perform as expected on another. On the problematic server, the query keeps running for hours without any noticeable change in the file size of the table. What could be the root cause of this issue, and how can we investigate it?

DECLARE @r INT;
DECLARE @batchsize INT;
 
create table #UnusedBlobIDs (
    ID UNIQUEIDENTIFIER PRIMARY KEY (ID)
);
 
SET @r = 1;
SET @batchsize=1000;
 
WITH [ExistingBlobs] ([BlobId])
    AS
    (SELECT [Blobs].[BlobId]
    FROM [Blobs]
    JOIN [SharedFields]
    ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [SharedFields].[Value]
    UNION
    SELECT [Blobs].[BlobId]
    FROM [Blobs]
    JOIN [SharedFields]
    ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [SharedFields].[Value]
    UNION
    SELECT [Blobs].[BlobId]
    FROM [Blobs]
    JOIN [VersionedFields]
    ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [VersionedFields].[Value]
    UNION
    SELECT [Blobs].[BlobId]
    FROM [Blobs]
    JOIN [VersionedFields]
    ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [VersionedFields].[Value]
    UNION
    SELECT [Blobs].[BlobId]
    FROM [Blobs]
    JOIN [UnversionedFields]
    ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [UnversionedFields].[Value]
    UNION
    SELECT [Blobs].[BlobId]
    FROM [Blobs]
    JOIN [UnversionedFields]
    ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [UnversionedFields].[Value]
    UNION
    SELECT [Blobs].[BlobId]
    FROM [Blobs]
    JOIN [ArchivedFields]
    ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [ArchivedFields].[Value]
    UNION
    SELECT [Blobs].[BlobId]
    FROM [Blobs]
    JOIN [ArchivedFields]
    ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [ArchivedFields].[Value])
     
    INSERT INTO #UnusedBlobIDs (ID) SELECT DISTINCT  [Blobs].[BlobId]
    FROM [Blobs]
    WHERE NOT EXISTS
    (  SELECT NULL
    FROM [ExistingBlobs]
    WHERE [ExistingBlobs].[BlobId] = [Blobs].[BlobId])
     
WHILE @r > 0
   BEGIN
       BEGIN TRANSACTION;
        DELETE TOP (@batchsize) FROM [Blobs] where [Blobs].[BlobId] IN (SELECT ID from #UnusedBlobIDs);
        SET @r = @@ROWCOUNT;
        COMMIT TRANSACTION;
    END
 

DROP TABLE #UnusedBlobIDs;

On further investigation I am doubting that insert in to temporary database itself is not getting successful.

Solution that worked for me: Instead using single query with joins on the multiple table I created a temporary table by querying tables one by one.


Solution

  • The query likely runs slowly on one Azure SQL server due to inefficient join logic, missing indexes, tempdb contention, or differences in query plans. Here are the key areas to investigate:

    1. CTE complexity and join pattern:

    Your CTE ExistingBlobs uses multiple UNION and JOIN like:

    '{' + CONVERT(NVARCHAR(MAX), [BlobId]) + '}' = [Value]
    

    These string concatenations and conversions disable index usage, resulting in full table scans. On large datasets, this can drastically increase query time.

    1. Missing or Ineffective Indexes:

    Check if the [Value] column in [SharedFields], [VersionedFields], etc. is indexed.

    If it's not indexed and you're doing a scalar string conversion (CONVERT(NVARCHAR(MAX)) on one side of the join, indexes won't be used even if they exist.

    1. High tempdb usage or contention:

    Since you're inserting into a temp table, a large INSERT SELECT may hit tempdb contention. To check tempdb usage use:

    SELECT session_id, user_objects_alloc_page_count, internal_objects_alloc_page_count
    FROM sys.dm_db_session_space_usage
    WHERE session_id = @@SPID;
    
    1. Run a simplified version of the insert logic to confirm if it's the slow part:
    SELECT COUNT(*) 
    FROM [Blobs] 
    WHERE NOT EXISTS (
        SELECT 1 
        FROM [SharedFields] 
        WHERE [SharedFields].[Value] LIKE '%' + CAST([Blobs].[BlobId] AS NVARCHAR) + '%'
    );
    

    If this takes a long time, then your joins in ExistingBlobs are the root cause.

    1. Query Plan Differences:

    The slow server may be using a bad query plan. This can happen if statistics differ or Query Store forced a suboptimal plan. Use Query Store to compare plans between the fast and slow servers:

    SELECT qs.query_id, rs.avg_duration, qp.query_plan
    FROM sys.query_store_query_text qt
    JOIN sys.query_store_query qs ON qt.query_text_id = qs.query_text_id
    JOIN sys.query_store_plan qp ON qs.query_id = qp.query_id
    JOIN sys.query_store_runtime_stats rs ON qp.plan_id = rs.plan_id
    WHERE qt.query_sql_text LIKE '%UnusedBlobIDs%';
    

    If a better plan exists on another server, you can use sp_query_store_force_plan to apply it to the slow one.

    Check for reference: sp_query_store_force_plan