I’m trying to understand the trade-offs of using unaligned checkpointing in Apache Flink. While I know that unaligned checkpointing can reduce checkpoint duration under backpressure by skipping the alignment phase, the documentation mentions that it comes with additional I/O overhead.
My question is:
Any clarification or examples would be greatly appreciated!
Unaligned checkpoints skip alignment by advancing the checkpoint barriers over unprocessed records. Those skipped over records are included in the checkpoint, resulting in additional I/O and increasing the size of the checkpoint.
If you study this table -- https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/state/checkpoints_vs_savepoints/#capabilities-and-limitations -- you'll see that unaligned checkpoints are not as versatile as aligned checkpoints. That's because the on-the-wire events aren't necessarily serialized into checkpoints with the state serializers that are used for writing state into checkpoints.
As for exactly-once guarantees, unaligned checkpoints fully support exactly-once. There is no effect on the end-to-end exactly-once semantics.