I am using delta live tables in Databricks to do some data transformation (normalizing flat file data) and I need some of the tables to have identity columns that will later serve as keys to establish relationships. I have read the documentation to add the column, but when I actually generate the table it only has null values in that column. The other columns have the data as expected.
As you can see in my code snippet, I have tried resolving this by adding functions to generate the values for that column (the two lines that are commented out) but both identity() and monotonically_increasing_id() generate errors when trying to validate the code. Running this as-is without those functions succeeds in generating the table with the dlt_id column, but it has all null values.
CREATE OR REFRESH STREAMING LIVE TABLE my_dlt
(
dlt_id BIGINT GENERATED ALWAYS AS IDENTITY,
source_column1 STRING,
source_column2 STRING
)
TBLPROPERTIES (delta.enableChangeDataFeed = true, "quality" = "silver")
AS
WITH stream_input AS
(
SELECT DISTINCT
source_column1,
source_column2
FROM stream(source_catalog.bronze_schema.source_table)
)
SELECT
--monotonically_increasing_id() as dlt_id,
--identity() as dlt_id,
source_column1,
source_column2
FROM stream_input;
I am coming from a SQL Server background, so I expected that defining the column as an identity would have the effect of automatically generating unique increasing integers. Based on the documentation I have read, this works in delta live tables too, but I am not seeing the expected behavior. I'd love some feedback or links to more documentation that shows what I'm doing wrong here.
Edit: I have also tried adding the optional start and increment values for the identity (START WITH 1 INCREMENT BY 1)
but the column is still all null.
I have been able to reproduce this several times now, so this solution works for me. When building the delta live table in my workspace, I write the script, click Validate, then click Start. This creates a table that I can query, and I can see the null values in the identity field. Next, I go to the Delta Live Tables area under the Data Engineering header, and I do a full refresh of the table once it has been created. This adds the identity values I was expecting. So here is what my final code looks like:
CREATE OR REFRESH STREAMING LIVE TABLE my_dlt
(
dlt_id BIGINT GENERATED ALWAYS AS IDENTITY,
source_column1 STRING,
source_column2 STRING
)
TBLPROPERTIES (delta.enableChangeDataFeed = true, "quality" = "silver")
AS
SELECT DISTINCT
source_column1,
source_column2
FROM stream(source_catalog.bronze_schema.source_table);