apache-sparklog4japache-spark-standalonespark2.4.4

Output Spark application name in driver log


I need to output the Spark application name (spark.app.name) in each line of the driver log (along with other attributes like message and date). So far I failed to find the correct log4j configuration or any other hints. How could it be done?

I would appreciate any help.

Using Spark standalone mode.


Solution

  • One way that seems to work involves the following two steps:

    1. Create your custom log4j.properties file and change the layout.:

      ...
      # this is just an example layout config
      # remember the rest of the configuration
      log4j.appender.stdout.layout.ConversionPattern=${appName}--%d{yyyy-mm-dd HH:mm:ss,SSS} [%-5p] [%c] - %m%n
      

      This file must be at the root of your class path (like in src/main/resources for most build tools) or edit <spark-home>/conf/log4j.properties on servers in your cluster.

    2. Then set a property with the referenced key before bootstrapping your spark context:

      System.setProperty("appName", "application-name");
      SparkSession spark = SparkSession.builder().appName("application-name")
      ...
      

    In my quick test, the above produces something like this in all lines (tested in local mode):

    application-name--2020-53-06 16:53:35,741 [INFO ] [org.apache.spark.SparkContext] - Running Spark version 2.4.4
    application-name--2020-53-06 16:53:36,032 [WARN ] [org.apache.hadoop.util.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    application-name--2020-53-06 16:53:36,316 [INFO ] [org.apache.spark.SparkContext] - Submitted application: JavaWordCount
    application-name--2020-53-06 16:53:36,413 [INFO ] [org.apache.spark.SecurityManager] - Changing view acls to: ernest
    application-name--2020-53-06 16:53:36,414 [INFO ] [org.apache.spark.SecurityManager] - Changing modify acls to: ernest
    application-name--2020-53-06 16:53:36,415 [INFO ] [org.apache.spark.SecurityManager] - Changing view acls groups to: 
    application-name--2020-53-06 16:53:36,415 [INFO ] [org.apache.spark.SecurityManager] - Changing modify acls groups to: 
    application-name--2020-53-06 16:53:36,416 [INFO ] [org.apache.spark.SecurityManager] - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ernest); groups with view permissions: Set(); users  with modify permissions: Set(ernest); groups with modify permissions: Set()
    application-name--2020-53-06 16:53:36,904 [INFO ] [org.apache.spark.util.Utils] - Successfully started service 'sparkDriver' on port 33343.
    application-name--2020-53-06 16:53:36,934 [INFO ] [org.apache.spark.SparkEnv] - Registering MapOutputTracker
    ...
    

    Instead of setting the variable by hand in code, you may prefer to call spark-submit with something like

    --conf 'spark.driver.extraJavaOptions=-DappName=application-name'
    

    For a more permanent change, you may want to edit <spark-home>/conf/log4j.properties (copy the template if the file doesn't exist) with the layout change, and call spark-submit/spark-shell, etc. with the system property.