apache-sparkclasspathhadoop-yarntypesafe-config

How to add configuration file to classpath of all Spark executors in Spark 1.2.0?


I'm using Typesafe Config, https://github.com/typesafehub/config, to parameterize a Spark job running in yarn-cluster mode with a configuration file. The default behavior of Typesafe Config is to search the classpath for resources with names matching a regex and to load them into your configuration class automatically with ConfigFactory.load() (for our purposes, assume the file it looks for is called application.conf).

I am able to load the configuration file into the driver using --driver-class-path <directory containing configuration file>, but using --conf spark.executor.extraClassPath=<directory containing configuration file> does not put the resource on the classpath of all executors like it should. The executors report that they can not find a certain configuration setting for a key that does exist in the configuration file that I'm attempting to add to their classpaths.

What is the correct way to add a file to the classpaths of all executor JVMs using Spark?


Solution

  • It looks like the value of the spark.executor.extraClassPath property is relative to the working directory of the application ON THE EXECUTOR.

    So, to use this property correctly, one should use --files <configuration file> to first direct Spark to copy the file to the working directory of all executors, then use spark.executor.extraClassPath=./ to add the executor's working directory to its classpath. This combination results in the executor being able to read values from the configuration file.