apache-sparkclouderaspark-ui

What is 'Active Jobs' in Spark History Server Spark UI Jobs section


I'm trying to understand Spark History server components. I know that, History server shows completed Spark applications.

Nonetheless, I see 'Active Jobs' set to 1 for a completed Spark application. I'm trying to understand what is 'Active Jobs' mean in Jobs section. Also, Application completed within 30 minutes, but when I opened History Server after 8 hours, 'Duration' shows 8.0h. Please see the screenshot.

enter image description here

Could you please help me understand 'Active Jobs', 'Duration' and 'Stages: Succeeded/Total' items in above image?


Solution

  • Finally after some research, found answer for my question.

    A Spark application consists of a driver and one or more executors. The driver program instantiates SparkContext, which coordinates the executors to run the Spark application. This information is displayed on Spark History Server Web UI 'Active Jobs' section.

    The executors run tasks assigned by the driver.

    When Spark application runs on YARN, it has its own implementation of yarn client and yarn application master. YARN application has a yarn client, yarn application master and list of container running on node managers.

    In my case Yarn is running in standalone mode, thus driver program is running as a thread of the yarn application master. The Yarn client pulls status from the application master and application master coordinates the containers to run the tasks.

    This running job could be monitored in YARN applications page in the Cloudera Manager Admin Console, while it is running.

    If application succeeds, then History server will show list of 'Completed Jobs' and also 'Active Jobs' section will be removed.

    If application fails at the containers level and YARN communicates this information to Driver then, History server will show list of 'Failed Jobs' and also 'Active Jobs' section will be removed.

    Nonetheless, if application fails at the containers level and YARN couldn't communicate that to driver, then Driver instantiated job gets into oblivion state. It thinks job is still being run and keeps waiting to hear from YARN application master for the job status. Hence, in History Server, it still shows up in 'Active Jobs' as running.

    So my take away from this is: To check the status of running job, go to YARN applications page in the Cloudera Manager Admin Console or use YARN CLI command. After job completion/failure, Open the Spark History Server to get more details on resources usage, DAG and execution timeline information.