sqlsql-serverduplicate-data

If I stop a long running query, does it rollback?


A query that is used to loop through 17 millions records to remove duplicates has been running now for about 16 hours and I wanted to know if the query is stopped right now if it will finalize the delete statements or if it has been deleting while running this query? Indeed, if I do stop it, does it finalize the deletes or rolls back?

I have found that when I do a

 select count(*) from myTable

That the rows that it returns (while doing this query) is about 5 less than what the starting row count was. Obviously the server resources are extremely poor, so does that mean that this process has taken 16 hours to find 5 duplicates (when there are actually thousands), and this could be running for days?

This query took 6 seconds on 2000 rows of test data, and it works great on that set of data, so I figured it would take 15 hours for the complete set.

Any ideas?

Below is the query:

--Declare the looping variable
DECLARE @LoopVar char(10)


    DECLARE
     --Set private variables that will be used throughout
      @long DECIMAL,
      @lat DECIMAL,
      @phoneNumber char(10),
      @businessname varchar(64),
      @winner char(10)

    SET @LoopVar = (SELECT MIN(RecordID) FROM MyTable)

    WHILE @LoopVar is not null
    BEGIN

      --initialize the private variables (essentially this is a .ctor)
      SELECT 
        @long = null,
        @lat = null,
        @businessname = null,
        @phoneNumber = null,
        @winner = null

      -- load data from the row declared when setting @LoopVar  
      SELECT
        @long = longitude,
        @lat = latitude,
        @businessname = BusinessName,
        @phoneNumber = Phone
      FROM MyTable
      WHERE RecordID = @LoopVar

      --find the winning row with that data. The winning row means 
      SELECT top 1 @Winner = RecordID
      FROM MyTable
      WHERE @long = longitude
        AND @lat = latitude
        AND @businessname = BusinessName
        AND @phoneNumber = Phone
      ORDER BY
        CASE WHEN webAddress is not null THEN 1 ELSE 2 END,
        CASE WHEN caption1 is not null THEN 1 ELSE 2 END,
        CASE WHEN caption2 is not null THEN 1 ELSE 2 END,
        RecordID

      --delete any losers.
      DELETE FROM MyTable
      WHERE @long = longitude
        AND @lat = latitude
        AND @businessname = BusinessName
        AND @phoneNumber = Phone
        AND @winner != RecordID

      -- prep the next loop value to go ahead and perform the next duplicate query.
      SET @LoopVar = (SELECT MIN(RecordID) 
    FROM MyTable
    WHERE @LoopVar < RecordID)
    END

Solution

  • no, sql server will not roll back the deletes it has already performed if you stop query execution. oracle requires an explicit committal of action queries or the data gets rolled back, but not mssql.

    with sql server it will not roll back unless you are specifically running in the context of a transaction and you rollback that transaction, or the connection closes without the transaction having been committed. but i don't see a transaction context in your above query.

    you could also try re-structuring your query to make the deletes a little more efficient, but essentially if the specs of your box are not up to snuff then you might be stuck waiting it out.

    going forward, you should create a unique index on the table to keep yourself from having to go through this again.