javaapache-flinkflink-streamingcheckpointing

Where is the default checkpoint(s) kept in Apache Flink?


I am a newbie to Apache Flink, and I was going through the Apache Flink's examples. I found that in case of a failure Flink has the ability to restore stream processing from a checkpoint.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(10000L);

Now, my question is where does Flink keep the checkpoint(s) by default?

Any help is appreciated!


Solution

  • Flink features the abstraction of StateBackends. A StateBackend is responsible to locally manage the state on the worker node but also to checkpoint (and restore) the state to a remote location.

    The default StateBackend is the MemoryStateBackend. It maintains the state on the workers' (TaskManagers') JVM heap and checkpoints it to the JVM heap of the master (JobManager). Hence the MemoryStateBackend does not require any additional configuration or external system and is good for local development. However, it is obviously not scalable and suited for any serious workload.

    Flink also provides a FSStateBackend, which holds holds local state also on the workers' JVM heap and checkpoints it to a remote file system (HDFS, NFS, ...). Finally, there's also the RocksDBStateBackend, which stores state in an embedded disk-based key-value store (RocksDB) and also checkpoints to a remote file system (HDFS, NFS, ...).