apache-sparkhadoop-yarnhandlerkillresource-cleanup

Apache Spark/scala: Handle yarn kill to perform actions before quitting (clean up resources, save the state)


I have multiple unrelated jobs running on Spark/Hadoop grid executors started by a single spark-submit from the driver node. When I need to stop the jobs, I'd like to save their IDs before the driver node program quits (or, more generally: to do something like save the state and/or clean up resources, it doesn't matter).

Is there a way to handle a yarn termination event to perform such operations as you would do it by analogy with POSIX signals by handling SIGTERM signal (& friends) to perform some last chance clean up.

I haven't found a way to handle yarn kill. Is there a way? Is there an alternative to yarn kill that would satisfy that need?

Thanks. Regards.


Solution

  • Use a Spark listener with onApplicationEnd to handle the yarn kill event:

    sparkContext.addSparkListener(new SparkListener {
        override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = {
        }
    }
    

    Apache Ignite (in the version I use) does exactly that for that purpose.