I am trying to understand the difference between the two most common strategies of incremental data load.
What is the difference between a streaming checkpoint vs a change stream in Databricks Delta Lake?
Thanks.
Checkpoints are for saving state and progress across [micro batches, incremental feeds such as CDF or append only Delta table ops using spark.readStream a la Spark Structured Streaming in various guises and for normal processing - when no error occurs] and [for restarts in the various aforementioned guises]. Meaning you need not track where you last processed from, that is automatic.
Change stream is the CDF (CDC) feed for / from Delta Lake tables that can be processed via spark.read or spark.readStream. In the same plane as say a KAFKA feed.
That's all.