We have a query that creates a temporary table to store IDs for deletion and then deletes the corresponding records from a table in an Azure SQL database using transactions. The database is part of an Azure SQL Elastic Pool. While this query worked perfectly on one Azure SQL server, it did not perform as expected on another. On the problematic server, the query keeps running for hours without any noticeable change in the file size of the table. What could be the root cause of this issue, and how can we investigate it?
DECLARE @r INT;
DECLARE @batchsize INT;
create table #UnusedBlobIDs (
ID UNIQUEIDENTIFIER PRIMARY KEY (ID)
);
SET @r = 1;
SET @batchsize=1000;
WITH [ExistingBlobs] ([BlobId])
AS
(SELECT [Blobs].[BlobId]
FROM [Blobs]
JOIN [SharedFields]
ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [SharedFields].[Value]
UNION
SELECT [Blobs].[BlobId]
FROM [Blobs]
JOIN [SharedFields]
ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [SharedFields].[Value]
UNION
SELECT [Blobs].[BlobId]
FROM [Blobs]
JOIN [VersionedFields]
ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [VersionedFields].[Value]
UNION
SELECT [Blobs].[BlobId]
FROM [Blobs]
JOIN [VersionedFields]
ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [VersionedFields].[Value]
UNION
SELECT [Blobs].[BlobId]
FROM [Blobs]
JOIN [UnversionedFields]
ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [UnversionedFields].[Value]
UNION
SELECT [Blobs].[BlobId]
FROM [Blobs]
JOIN [UnversionedFields]
ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [UnversionedFields].[Value]
UNION
SELECT [Blobs].[BlobId]
FROM [Blobs]
JOIN [ArchivedFields]
ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [ArchivedFields].[Value]
UNION
SELECT [Blobs].[BlobId]
FROM [Blobs]
JOIN [ArchivedFields]
ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [ArchivedFields].[Value])
INSERT INTO #UnusedBlobIDs (ID) SELECT DISTINCT [Blobs].[BlobId]
FROM [Blobs]
WHERE NOT EXISTS
( SELECT NULL
FROM [ExistingBlobs]
WHERE [ExistingBlobs].[BlobId] = [Blobs].[BlobId])
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
DELETE TOP (@batchsize) FROM [Blobs] where [Blobs].[BlobId] IN (SELECT ID from #UnusedBlobIDs);
SET @r = @@ROWCOUNT;
COMMIT TRANSACTION;
END
DROP TABLE #UnusedBlobIDs;
On further investigation I am doubting that insert in to temporary database itself is not getting successful.
Solution that worked for me: Instead using single query with joins on the multiple table I created a temporary table by querying tables one by one.
The query likely runs slowly on one Azure SQL server due to inefficient join logic, missing indexes, tempdb contention, or differences in query plans. Here are the key areas to investigate:
Your CTE ExistingBlobs
uses multiple UNION and JOIN like:
'{' + CONVERT(NVARCHAR(MAX), [BlobId]) + '}' = [Value]
These string concatenations and conversions disable index usage, resulting in full table scans. On large datasets, this can drastically increase query time.
Check if the [Value]
column in [SharedFields]
, [VersionedFields]
, etc. is indexed.
If it's not indexed and you're doing a scalar string conversion (CONVERT(NVARCHAR(MAX)
) on one side of the join, indexes won't be used even if they exist.
Since you're inserting into a temp table, a large INSERT SELECT
may hit tempdb
contention.
To check tempdb usage use:
SELECT session_id, user_objects_alloc_page_count, internal_objects_alloc_page_count
FROM sys.dm_db_session_space_usage
WHERE session_id = @@SPID;
SELECT COUNT(*)
FROM [Blobs]
WHERE NOT EXISTS (
SELECT 1
FROM [SharedFields]
WHERE [SharedFields].[Value] LIKE '%' + CAST([Blobs].[BlobId] AS NVARCHAR) + '%'
);
If this takes a long time, then your joins in ExistingBlobs are the root cause.
The slow server may be using a bad query plan. This can happen if statistics differ or Query Store forced a suboptimal plan. Use Query Store to compare plans between the fast and slow servers:
SELECT qs.query_id, rs.avg_duration, qp.query_plan
FROM sys.query_store_query_text qt
JOIN sys.query_store_query qs ON qt.query_text_id = qs.query_text_id
JOIN sys.query_store_plan qp ON qs.query_id = qp.query_id
JOIN sys.query_store_runtime_stats rs ON qp.plan_id = rs.plan_id
WHERE qt.query_sql_text LIKE '%UnusedBlobIDs%';
If a better plan exists on another server, you can use sp_query_store_force_plan
to apply it to the slow one.
Check for reference: sp_query_store_force_plan