scaladatabricksdelta-lakechange-data-capture

Enable Change Data Feed in Databricks Delta Table


I am using delta OSS(v2.0.0), I have an existing delta table, and I want to enable change data feed (CDF) for that table. But after altering the table properties I can see that the table properties have been updated but the history of the delta table doesn't show the CDF being enabled.

Code:

spark.sql("DESCRIBE HISTORY '/Users/yatharthmaheshwari/data-partner-merge/src/main/resources/delta/onaudience/dpm/base'  ").show(2,false)

spark.sql("CREATE TABLE default.dpm_delta USING DELTA LOCATION  '/Users/yatharthmaheshwari/data-partner-merge/src/main/resources/delta/onaudience/dpm/base'  ")

spark.sql("SHOW TBLPROPERTIES default.dpm_delta  ").show(false)

spark.sql("ALTER TABLE default.dpm_delta SET TBLPROPERTIES (delta.enableChangeDataFeed = true)")

spark.sql("SHOW TBLPROPERTIES default.dpm_delta  ").show(false)

spark.sql("DESCRIBE HISTORY '/Users/yatharthmaheshwari/data-partner-merge/src/main/resources/delta/onaudience/dpm/base'  ").show(2,false)

Output

TABLE HISTORY PRE CHANGES
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|version|          timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|userMetadata|          engineInfo|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|     46|2022-08-02 13:46:33|  null|    null|    MERGE|{predicate -> ((u...|null|    null|     null|         45|  Serializable|        false|{numTargetRowsCop...|        null|Apache-Spark/3.2....|
|     45|2022-08-02 13:12:58|  null|    null|    MERGE|{predicate -> ((u...|null|    null|     null|         44|  Serializable|        false|{numTargetRowsCop...|        null|Apache-Spark/3.2....|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
only showing top 2 rows

TABLE PROPS PRE CHANGES
+--------+--------------------+
|     key|               value|
+--------+--------------------+
|provider|               DELTA|
|location|/Users/yatharthma...|
|   owner|  yatharthmaheshwari|
+--------+--------------------+

TABLE PROPS POST CHANGES
+--------------------+--------------------+
|                 key|               value|
+--------------------+--------------------+
|            provider|               DELTA|
|            location|/Users/yatharthma...|
|               owner|  yatharthmaheshwari|
|delta.enableChang...|                true|
+--------------------+--------------------+

TABLE HISTORY POST CHANGES
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|version|          timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|userMetadata|          engineInfo|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|     46|2022-08-02 13:46:33|  null|    null|    MERGE|{predicate -> ((u...|null|    null|     null|         45|  Serializable|        false|{numTargetRowsCop...|        null|Apache-Spark/3.2....|
|     45|2022-08-02 13:12:58|  null|    null|    MERGE|{predicate -> ((u...|null|    null|     null|         44|  Serializable|        false|{numTargetRowsCop...|        null|Apache-Spark/3.2....|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
only showing top 2 rows

If I try to do similar changes in a databricks notebook, I can see the CDF changes in the table history.


Solution

  • I was able to figure out the issue, while initializing the SparkSession we need to add a couple of configs. Once that was added, it worked as expected.

    config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
    

    Reference: https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake