[SOLVED] Does unaligned checkpointing in Flink have disadvantages beyond extra I/O, and does it affect end-to-end exactly-once semantics?

Does unaligned checkpointing in Flink have disadvantages beyond extra I/O, and does it affect end-to-end exactly-once semantics?

I’m trying to understand the trade-offs of using unaligned checkpointing in Apache Flink. While I know that unaligned checkpointing can reduce checkpoint duration under backpressure by skipping the alignment phase, the documentation mentions that it comes with additional I/O overhead.

My question is:

Besides the extra I/O, are there any other disadvantages of unaligned checkpointing compared to aligned checkpointing?
Does it impact the end-to-end exactly-once semantics, especially since it might cause duplication in operators with multiple input streams in case of failure?

Any clarification or examples would be greatly appreciated!

Solution

Unaligned checkpoints skip alignment by advancing the checkpoint barriers over unprocessed records. Those skipped over records are included in the checkpoint, resulting in additional I/O and increasing the size of the checkpoint.

If you study this table -- https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/state/checkpoints_vs_savepoints/#capabilities-and-limitations -- you'll see that unaligned checkpoints are not as versatile as aligned checkpoints. That's because the on-the-wire events aren't necessarily serialized into checkpoints with the state serializers that are used for writing state into checkpoints.

As for exactly-once guarantees, unaligned checkpoints fully support exactly-once. There is no effect on the end-to-end exactly-once semantics.