apache-flinkflink-streaming

Does unaligned checkpointing in Flink have disadvantages beyond extra I/O, and does it affect end-to-end exactly-once semantics?


I’m trying to understand the trade-offs of using unaligned checkpointing in Apache Flink. While I know that unaligned checkpointing can reduce checkpoint duration under backpressure by skipping the alignment phase, the documentation mentions that it comes with additional I/O overhead.

My question is:

  1. Besides the extra I/O, are there any other disadvantages of unaligned checkpointing compared to aligned checkpointing?
  2. Does it impact the end-to-end exactly-once semantics, especially since it might cause duplication in operators with multiple input streams in case of failure?

Any clarification or examples would be greatly appreciated!


Solution

  • Unaligned checkpoints skip alignment by advancing the checkpoint barriers over unprocessed records. Those skipped over records are included in the checkpoint, resulting in additional I/O and increasing the size of the checkpoint.

    If you study this table -- https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/state/checkpoints_vs_savepoints/#capabilities-and-limitations -- you'll see that unaligned checkpoints are not as versatile as aligned checkpoints. That's because the on-the-wire events aren't necessarily serialized into checkpoints with the state serializers that are used for writing state into checkpoints.

    As for exactly-once guarantees, unaligned checkpoints fully support exactly-once. There is no effect on the end-to-end exactly-once semantics.