We can set the checkpoint directory path in PySpark using the code below:
spark.sparkContext.setCheckpointDir('/checkpoints')
As SparkContext.getCheckpointDir()
is only introduced in PySpark version 3.1.0, how to get the checkpoint directory path using an older version PySpark like v2.4.3 ?
SparkContext.getCheckpointDir()
is only implemented in PySpark version 3.1.0, but luckily it was already implemented in the underlying Scala codebase in v2.4.3
. You can see that here.
You can access the underlying sparksession (JavaSparkContext) with the _jsc
attribute. The following works in a pyspark REPL on version 2.4.5
:
>>> spark.sparkContext.setCheckpointDir('/checkpoints')
>>> sc._jsc.sc().getCheckpointDir().get()
'file:/checkpoints/1829fbb4-0b7b-44c5-b275-50276d063565'