apache-sparkfault-tolerance

Spark checkpointing behaviour


Does Spark use checkpoints when we start a new job? Let's say we used a checkpoint to write some RDD to a disk. Will the said RDD be recalculated or loaded from the disk during a new job?


Solution

  • at the start of the job, if a RDD is present in your checkpoint location, it will be loaded.

    That also mean that if you change code, you should also be careful about checkpointing because a RDD with old code is loaded with new code and that can cause conflict.