apache-sparkguavanosuchmethoderror

sparkjob fails with guava error-java.lang.NoSuchMethodError:com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V


We have setup open source apache hadoop cluster with following below components.


hadoop - 3.1.4
spark - 3.3.1
hive - 3.1.3

When we are trying to run the spark example job with below command but it fails with the following exception.

/opt/spark-3.3.1/bin/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn --num-executors 1 --driver-memory 1G --executor-memory 1G --executor-cores 1  /opt/spark-3.3.1/examples/jars/spark-examples_2.12-3.3.1.jar

Error :

[2022-12-09 00:05:02.747]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hdfsdata2/yarn/local/usercache/spark/filecache/70/__spark_libs__3692263374412677830.zip/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.1.4/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2022-12-09 00:05:02,137 INFO util.SignalUtils: Registering signal handler for TERM
2022-12-09 00:05:02,139 INFO util.SignalUtils: Registering signal handler for HUP
2022-12-09 00:05:02,139 INFO util.SignalUtils: Registering signal handler for INT
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
    at org.apache.spark.deploy.SparkHadoopUtil$.$anonfun$appendHiveConfigs$1(SparkHadoopUtil.scala:477)
    at org.apache.spark.deploy.SparkHadoopUtil$.$anonfun$appendHiveConfigs$1$adapted(SparkHadoopUtil.scala:476)
    at scala.collection.immutable.Stream.foreach(Stream.scala:533)
    at org.apache.spark.deploy.SparkHadoopUtil$.appendHiveConfigs(SparkHadoopUtil.scala:476)
    at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
    at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:430)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:894)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

After debugging, this error seems to be related to guava and it's dependent jars.

Hadoop is having guava-27.0-jre.jar and spark is having guava-14.0-jre.jar.

I removed the spark guava jars and copied the guava and it's dependent jars from hadoop lib location to spark jars folder. Below are all list of guava and it's dependent jars.

/opt/spark-3.3.1/jars/animal-sniffer-annotations-1.17.jar
/opt/spark-3.3.1/jars/failureaccess-1.0.jar
/opt/spark-3.3.1/jars/error_prone_annotations-2.2.0.jar
/opt/spark-3.3.1/jars/checker-qual-2.5.2.jar
/opt/spark-3.3.1/jars/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
/opt/spark-3.3.1/jars/jsr305-3.0.2.jar
/opt/spark-3.3.1/jars/j2objc-annotations-1.1.jar
/opt/spark-3.3.1/jars/guava-27.0-jre.jar

But still the error seems to persist.

Interestingly, when I run the same example spark job as below it succeeds.

/opt/spark-3.3.1/bin/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn --num-executors 1 --driver-memory 1G --executor-memory 1G --executor-cores 1  /opt/spark-3.3.1/examples/jars/spark-examples_2.12-3.3.1.jar 50

So the observation is any value less than 50 passed at the end of the command fails whereas higher value makes the job succeed. I am not sure about the reason behind this.


Solution

  • @Saurav Suman, To avoid this confusion of spark to find its hadoop yarn specific jars, Apache Spark documentation has provided a clean resolution.

    If you have downloaded Spark without hadoop binaries and you have set up hadoop by yourself, then you should follow this below link and create your SPARK_DIST_CLASSPATH=$(hadoop classpath) https://spark.apache.org/docs/latest/hadoop-provided.html

    And this will take care of Spark picking up its hadoop related binaries. And I reckon spark does not come with its own Guava. This above error particularly appears when either you are using outdated Guava or there are multiple Guava files that are conflicting which is not allowing Spark to decide which one to choose. Hope this helps.