I am using delta OSS(v2.0.0), I have an existing delta table, and I want to enable change data feed (CDF) for that table. But after altering the table properties I can see that the table properties have been updated but the history of the delta table doesn't show the CDF being enabled.
Code:
spark.sql("DESCRIBE HISTORY '/Users/yatharthmaheshwari/data-partner-merge/src/main/resources/delta/onaudience/dpm/base' ").show(2,false)
spark.sql("CREATE TABLE default.dpm_delta USING DELTA LOCATION '/Users/yatharthmaheshwari/data-partner-merge/src/main/resources/delta/onaudience/dpm/base' ")
spark.sql("SHOW TBLPROPERTIES default.dpm_delta ").show(false)
spark.sql("ALTER TABLE default.dpm_delta SET TBLPROPERTIES (delta.enableChangeDataFeed = true)")
spark.sql("SHOW TBLPROPERTIES default.dpm_delta ").show(false)
spark.sql("DESCRIBE HISTORY '/Users/yatharthmaheshwari/data-partner-merge/src/main/resources/delta/onaudience/dpm/base' ").show(2,false)
Output
TABLE HISTORY PRE CHANGES
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|version| timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend| operationMetrics|userMetadata| engineInfo|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
| 46|2022-08-02 13:46:33| null| null| MERGE|{predicate -> ((u...|null| null| null| 45| Serializable| false|{numTargetRowsCop...| null|Apache-Spark/3.2....|
| 45|2022-08-02 13:12:58| null| null| MERGE|{predicate -> ((u...|null| null| null| 44| Serializable| false|{numTargetRowsCop...| null|Apache-Spark/3.2....|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
only showing top 2 rows
TABLE PROPS PRE CHANGES
+--------+--------------------+
| key| value|
+--------+--------------------+
|provider| DELTA|
|location|/Users/yatharthma...|
| owner| yatharthmaheshwari|
+--------+--------------------+
TABLE PROPS POST CHANGES
+--------------------+--------------------+
| key| value|
+--------------------+--------------------+
| provider| DELTA|
| location|/Users/yatharthma...|
| owner| yatharthmaheshwari|
|delta.enableChang...| true|
+--------------------+--------------------+
TABLE HISTORY POST CHANGES
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|version| timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend| operationMetrics|userMetadata| engineInfo|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
| 46|2022-08-02 13:46:33| null| null| MERGE|{predicate -> ((u...|null| null| null| 45| Serializable| false|{numTargetRowsCop...| null|Apache-Spark/3.2....|
| 45|2022-08-02 13:12:58| null| null| MERGE|{predicate -> ((u...|null| null| null| 44| Serializable| false|{numTargetRowsCop...| null|Apache-Spark/3.2....|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
only showing top 2 rows
If I try to do similar changes in a databricks notebook, I can see the CDF changes in the table history.
I was able to figure out the issue, while initializing the SparkSession we need to add a couple of configs. Once that was added, it worked as expected.
config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
Reference: https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake