azure-databricksdelta-live-tables

Identity column in a Databricks delta live table has all null values


I am using delta live tables in Databricks to do some data transformation (normalizing flat file data) and I need some of the tables to have identity columns that will later serve as keys to establish relationships. I have read the documentation to add the column, but when I actually generate the table it only has null values in that column. The other columns have the data as expected.

As you can see in my code snippet, I have tried resolving this by adding functions to generate the values for that column (the two lines that are commented out) but both identity() and monotonically_increasing_id() generate errors when trying to validate the code. Running this as-is without those functions succeeds in generating the table with the dlt_id column, but it has all null values.

CREATE OR REFRESH STREAMING LIVE TABLE my_dlt
(
  dlt_id BIGINT GENERATED ALWAYS AS IDENTITY,
  source_column1 STRING,
  source_column2 STRING
)
TBLPROPERTIES (delta.enableChangeDataFeed = true, "quality" = "silver")
AS 
  WITH stream_input AS 
  (
    SELECT DISTINCT
      source_column1,
      source_column2
    FROM stream(source_catalog.bronze_schema.source_table)
  )
  SELECT
    --monotonically_increasing_id() as dlt_id,
    --identity() as dlt_id,
    source_column1,
    source_column2
  FROM stream_input;

I am coming from a SQL Server background, so I expected that defining the column as an identity would have the effect of automatically generating unique increasing integers. Based on the documentation I have read, this works in delta live tables too, but I am not seeing the expected behavior. I'd love some feedback or links to more documentation that shows what I'm doing wrong here.

Edit: I have also tried adding the optional start and increment values for the identity (START WITH 1 INCREMENT BY 1) but the column is still all null.


Solution

  • I have been able to reproduce this several times now, so this solution works for me. When building the delta live table in my workspace, I write the script, click Validate, then click Start. This creates a table that I can query, and I can see the null values in the identity field. Next, I go to the Delta Live Tables area under the Data Engineering header, and I do a full refresh of the table once it has been created. This adds the identity values I was expecting. So here is what my final code looks like:

    CREATE OR REFRESH STREAMING LIVE TABLE my_dlt
    (
      dlt_id BIGINT GENERATED ALWAYS AS IDENTITY,
      source_column1 STRING,
      source_column2 STRING
    )
    TBLPROPERTIES (delta.enableChangeDataFeed = true, "quality" = "silver")
    AS
      SELECT DISTINCT
        source_column1,
        source_column2
      FROM stream(source_catalog.bronze_schema.source_table);