I am trying to connect from Spark (running on my PC) to my S3 bucket:
val spark = SparkSession
.builder
.appName("S3Client")
.config("spark.master", "local")
.getOrCreate()
val sc = spark.sparkContext;
sc.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY)
sc.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY)
val txtFile = sc.textFile("s3a://bucket-name/folder/file.txt")
val contents = txtFile.collect();
But getting the following exception:
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 07A7BDC9135BCC84, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: 6ly2vhZ2mAJdQl5UZ/QUdilFFN1hKhRzirw6h441oosGz+PLIvLW2fXsZ9xmd8cuBrNHCdh8UPE=
I have seen this question but it didn't help me.
Edit:
As Zack suggested, I added:
sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3.eu-central-1.amazonaws.com")
But I still get the same exception.
I've solve the problem.
I was targeting a region (Frankfurt) that required using version 4 of the signature.
I've changed the region of the S3 bucket to Ireland and now it's working.