I am trying to read a CSV file from S3 using Apache Spark, but I encounter the following error:
java.lang.NoClassDefFoundError: software/amazon/awssdk/transfer/s3/progress/TransferListener
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2625)
...
These are the jars I'm using:
C:/spark/jars/iceberg-spark-runtime-3.5_2.12-1.7.1.jar,
C:/spark/jars/hadoop-aws-3.4.0.jar,
C:/spark/jars/aws-java-sdk-core-1.11.999.jar,
C:/spark/jars/aws-java-sdk-s3-1.11.999.jar,
C:/spark/jars/aws-sdk-core-2.17.99.jar,
C:/spark/jars/aws-sdk-s3-2.17.99.jar
Using only AWS SDK v1 (aws-java-sdk-core and aws-java-sdk-s3) causes authentication errors.
Using only AWS SDK v2 (aws-sdk-core and aws-sdk-s3) results in missing TransferListener.
Combining v1 and v2 jars in the spark-shell command, I'm still getting the same NoClassDefFoundError
.
You should use spark.jars.packages
config variable in the codebase where you are using these functions, not directly modifying the global Spark classpath.