apache-flinkflink-streamingflink-state

With a CheckPointed function in Flink, does the user call initializeState and snapshotState or is it handled behind the scenes


I am following an example here: https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/StatefulSequenceSource.java

I am trying to build a source using a jdbc connection which extends RichParallelFunction and implements CheckpointedFunction, as I would like to be able to save my watermark from my source tables in the case of restart.

When testing locally with docker, I can call my run() method just fine and read data from my source database, but I am not sure where the snapshotState and initializeState methods actually get called. I have logic in those methods that should be setting the value of my watermark based on first startup/recovery - I just never see that being accessed, and not sure if I should be calling the methods externally?

Thanks for the help in advance!


Solution

  • The methods will be called by the Flink framework when it needs to (when performing a checkpoint or a save point).

    See https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/fault-tolerance/state/#using-operator-state and https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/checkpoint/CheckpointedFunction.html