Does SQL Server guarantee that NewSequentialId() will be called for each row in the order specified by the ORDER BY clause of an INSERT statement?
The goal is to take a list of objects in C#, each of which represents a row to be inserted in a table, and insert them into a table fairly quickly.
What I'm trying to do is insert the rows into a temporary table using SqlBulkCopy, then insert the rows from the temp table into a table that uses NewSequentialId(), then retrieve the new IDs in a way that they can be sorted in the same order as the list of objects in C#, so that the IDs can be attached to each corresponding object in C#.
I'm using SQL Server 2016 and this is the target table:
CREATE TABLE dbo.MyTable
(
Id UNIQUEIDENTIFIER NOT NULL PRIMARY KEY DEFAULT NEWSEQUENTIALID(),
SomeNonUniqueValue NVARCHAR(50) NOT NULL
)
First I use SqlBulkCopy to insert rows into this temp table. The RowOrder columns contains an integer generated in the application. RowOrder is the order I need the generated IDs to be returned in. In the application, RowOrder is the index of each C# object in the list.
CREATE TABLE #MyTableStaging
(
RowOrder INT NOT NULL,
SomeNonUniqueValue NVARCHAR(50) NOT NULL
)
Then I run this SQL to take the rows from #MyTableStaging, insert them into MyTable and retrieve the inserted IDs.
DECLARE @MyTableOutput TABLE
(
Id UNIQUEIDENTIFIER NOT NULL
)
INSERT INTO dbo.MyTable (SomeNonUniqueValue)
OUTPUT Inserted.Id INTO @MyTableOutput(Id)
SELECT SomeNonUniqueValue
FROM #MyTableStaging
ORDER BY RowOrder
SELECT Id FROM @MyTableOutput ORDER BY Id
In all my testing this works. However, I recently found out that the order in which rows are inserted into the table specified in the OUTPUT clause is not always the same as the order specified by the ORDER BY in the INSERT statement (I found this because the original design of this system was to use an identity in #MyTableStaging, and rather than ordering by #MyTableStaging.Id I was ordering by the identity column).
I know that SQL Server guarantees that identity values are generated in the order specified in the ORDER BY clause of an INSERT statement (from https://learn.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-2017#limitations-and-restrictions):
INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted.
The easiest (and likely most efficient) way would be to insert into the target MyTable
table directly without the intermediate staging table. I'd use a table-valued-parameter to pass the table of values into your stored procedure.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/table-valued-parameters
If you really want to use the staging table, you can't rely on the order of rows returned by the OUTPUT
clause. You need to store the explicit mapping between the #MyTableStaging.RowOrder
and the generated MyTable.Id
. When you use OUTPUT
clause in a simple INSERT
statement you can't include columns from the source table into the output. There is a workaround. You can use MERGE
instead of INSERT
and OUTPUT
clause of the MERGE
statement allows columns from source table.
See very similar question Combine OUTPUT inserted.id with value from selected row
MERGE
can INSERT
, UPDATE
and DELETE
rows. In our case we need only to INSERT
.
1=0
is always false, so the NOT MATCHED BY TARGET
part is always executed.
In general, there could be other branches, see docs.
WHEN MATCHED
is usually used to UPDATE
;
WHEN NOT MATCHED BY SOURCE
is usually used to DELETE
, but we don't need them here.
This convoluted form of MERGE
is equivalent to simple INSERT
,
but unlike simple INSERT
its OUTPUT
clause allows to refer to the columns that we need.
It allows to retrieve columns from both source and destination tables thus saving a mapping between old and new IDs.
DECLARE @MyTableOutput TABLE
(
OldRowOrder int NOT NULL
,NewID UNIQUEIDENTIFIER NOT NULL
);
MERGE INTO dbo.MyTable
USING
(
SELECT RowOrder, SomeNonUniqueValue
FROM #MyTableStaging
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT (SomeNonUniqueValue)
VALUES (Src.SomeNonUniqueValue)
OUTPUT Src.RowOrder AS OldRowOrder, inserted.ID AS NewID
INTO @MyTableOutput(OldRowOrder, NewID)
;
If your DBA is so afraid of MERGE
you don't have to use it. It will be less efficient, though.
Simply insert all rows.
INSERT INTO dbo.MyTable (SomeNonUniqueValue)
SELECT SomeNonUniqueValue
FROM #MyTableStaging
;
We don't care about the order.
If SomeNonUniqueValue
were unique, you could just join on this column to map RowOrder
to Id
. Since these values are not unique, we'll need an extra step and generate unique row numbers for joining.
WITH
CTE_Dst
AS
(
SELECT
Id
,SomeNonUniqueValue
,ROW_NUMBER() OVER (ORDER BY SomeNonUniqueValue) AS rn
FROM dbo.MyTable
)
,CTE_Src
AS
(
SELECT
RowOrder
,SomeNonUniqueValue
,ROW_NUMBER() OVER (ORDER BY SomeNonUniqueValue) AS rn
FROM #MyTableStaging
)
SELECT
CTE_Dst.Id
,CTE_Src.RowOrder
FROM
CTE_Dst
INNER JOIN CTE_Src ON CTE_Src.rn = CTE_Dst.rn
;
If you have, say, three rows with the same SomeNonUniqueValue
it doesn't really matter how you map these rows together, because SomeNonUniqueValue
is the same.
Example:
#MyTableStaging
+----------+--------------------+
| RowOrder | SomeNonUniqueValue |
+----------+--------------------+
| 1 | qwerty |
| 2 | qwerty |
| 3 | qwerty |
| 4 | asdf |
| 5 | asdf |
+----------+--------------------+
MyTable
+----+--------------------+
| ID | SomeNonUniqueValue |
+----+--------------------+
| A | qwerty |
| B | qwerty |
| C | qwerty |
| D | asdf |
| E | asdf |
+----+--------------------+
You can map them like this:
+----------+----+--------------------+
| RowOrder | ID | SomeNonUniqueValue |
+----------+----+--------------------+
| 1 | A | qwerty |
| 2 | B | qwerty |
| 3 | C | qwerty |
| 4 | D | asdf |
| 5 | E | asdf |
+----------+----+--------------------+
Or, you can map them like this:
+----------+----+--------------------+
| RowOrder | ID | SomeNonUniqueValue |
+----------+----+--------------------+
| 1 | B | qwerty |
| 2 | C | qwerty |
| 3 | A | qwerty |
| 4 | E | asdf |
| 5 | D | asdf |
+----------+----+--------------------+
It is still a valid mapping, because all three values of qwerty
are the same. Neither of these mapping is "more correct" than another.
Obviously, if your MyTable
wasn't empty before the INSERT
, you need to select new rows only.