I have an EMR cluster (v5.12.1) and my S3 bucket setup with encryption at rest using the same AWS SSE-KMS key.
Reading the data from S3 works fine, but when I write to my S3 bucket using a Pyspark script - the parquet files are encrypted using the default 'aws/s3' key.
How can I get Spark to use the correct KMS key?
The cluster has Hadoop 2.8.3 and Spark 2.2.1
The solution is to not use s3a:// or s3n:// paths for your output files.
The files will be written to S3 and encrypted with the correct SSE-KMS key if you use the s3:// prefix only.