If I have a table with an auto-incrementing ID column, I'd like to be able to insert a row into that table, and get the ID of the row I just created. I know that generally, StackOverflow questions need some sort of code that was attempted or research effort, but I'm not sure where to begin with Snowflake. I've dug through their documentation and I've found nothing for this.
The best I could do so far is try result_scan()
and last_query_id()
, but these don't give me any relevant information about the row that was inserted, just confirmation that a row was inserted.
I believe what I'm asking for is along the lines of MS SQL Server's SCOPE_IDENTITY()
function.
Is there a Snowflake equivalent function for MS SQL Server's SCOPE_IDENTITY()
?
EDIT: for the sake of having code in here:
CREATE TABLE my_db..my_table
(
ROWID INT IDENTITY(1,1),
some_number INT,
a_time TIMESTAMP_LTZ(9),
b_time TIMESTAMP_LTZ(9),
more_data VARCHAR(10)
);
INSERT INTO my_db..my_table
(
some_number,
a_time,
more_data
)
VALUES
(1, my_time_value, some_data);
I want to get to that auto-increment ROWID
for this row I just inserted.
NOTE: The answer below can be not 100% correct in some very rare cases, see the UPDATE section below
Snowflake does not provide the equivalent of SCOPE_IDENTITY
today.
However, you can exploit Snowflake's time travel to retrieve the maximum value of a column right after a given statement is executed.
Here's an example:
create or replace table x(rid int identity, num int);
insert into x(num) values(7);
insert into x(num) values(9);
-- you can insert rows in a separate transaction now to test it
select max(rid) from x AT(statement=>last_query_id());
----------+
MAX(RID) |
----------+
2 |
----------+
You can also save the last_query_id()
into a variable if you want to access it later, e.g.
insert into x(num) values(5);
set qid = last_query_id();
...
select max(rid) from x AT(statement=>$qid);
Note - it will be usually correct, but if the user e.g. inserts a large value into rid
manually, it might influence the result of this query.
Note, I realized the code above might rarely generate incorrect answer.
Since the execution order of various phases of a query in a distributed system like Snowflake
can be non-deterministic, and Snowflake allows concurrent INSERT statements, the following might happen
Q1
and Q2
, do a simple single row INSERT
, start at roughly the same timeQ1
starts, is a bit aheadQ2
startsQ1
creates a row with value 1
from the IDENTITY
columnQ2
creates a row with value 2
from the IDENTITY
columnQ2
gets ahead of Q1
- this is the key partQ2
commits, is marked as finished at time T2
Q1
commits, is marked as finished at time T1
Note that T1
is later than T2
. Now, when we try to do SELECT ... AT(statement=>Q1)
, we will see the state as-of T1
, including all changes from statements before, hence including the value 2
from Q2
. Which is not what we want.
The way around it could be to add a unique identifier
to each INSERT
(e.g. from a separate SEQUENCE object), and then use a MAX
.
Sorry. Distributed transactions are hard :)