I'm trying to import csv files from S3 using spark-shell (val df=spark.read.csv("s3a://xxxxxx") ) the spark-shell client is connected to a remote yarn cluster. It failed with java.lang.VerifyError, however, when I launch spark-shell from the same machine of yarn resourcemanager, it works fine.
Here is the error code :
java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus(Lorg/apache/hadoop/fs/Path;Ljava/lang/String;Ljava/util/Set;)Lorg/apache/hadoop/fs/s3a/S3AFileStatus; @274: invokestatic
Reason:
Type 'com/amazonaws/AmazonServiceException' (current frame, stack[2]) is not assignable to 'com/amazonaws/SdkBaseException'
Current Frame:
bci: @274
flags: { }
locals: { 'org/apache/hadoop/fs/s3a/S3AFileSystem', 'org/apache/hadoop/fs/Path', 'java/lang/String', 'java/util/Set', 'java/lang/String', 'com/amazonaws/AmazonServiceException' }
stack: { 'java/lang/String', 'java/lang/String', 'com/amazonaws/AmazonServiceException' }
spark.master yarn
spark.hadoop.fs.s3a.server-side-encryption-algorithm SSE-KMS
spark.hadoop.fs.s3a.server-side-encryption.key xxxxxxxxxxxxxxxxxxxxxxxxxxx
spark.hadoop.fs.s3a.enableServerSideEncryption true
com.amazonaws.services.s3.enableV4 true
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.blockManager.port 20020
spark.driver.port 20020
spark.master.ui.port 4048
spark.ui.port 4041
spark.port.maxRetries 100
spark.yarn.jars hdfs://hdfs-master:4040/spark/jars/*
spark.driver.extraJavaOptions=-Dlog4j.configuration=/usr/local/spark/conf/log4j.properties
spark.executor.extraJavaOptions=-Dlog4j.configuration=/usr/local/spark/conf/log4j.properties
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hdfs-master:4040/spark-logs
spark.yarn.app.container.log.dir /home/aws_install/hadoop/logdir
hadoop_add_to_classpath_tools hadoop-aws
Any idea what's the root of the problem ?
hints of classpath problems.
One problem with that hadooprc change is that it only changes your local environment, not those in the rest of the cluster. But the fact you got as far as org/apache/hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus
implies that the S3A jar is being loaded -but the JVM itself is having problems
Possibly there is two copies of the AWS SDK on the classpath, and so its saying that the AmazonServiceException
just raised isn't a subclass of SdkBaseException
because of the mixed JARs.