amazon-web-servicesapache-sparkpysparkaws-gluespark-ui

Why doesn't AWS Glue generate spark event logs


I have an AWS glue job with Spark UI enabled by following this instruction: Enabling the Spark UI for Jobs

The glue job has s3:* access to arn:aws:s3:::my-spark-event-bucket/* resource. But for some reason, when I run the glue job (and it successfully finished within 40-50 seconds and successfully generated the output parquet files), it doesn't generate any spark event logs to the destination s3 path. I wonder what could have gone wrong and if there is any systematic way for me to pinpoint the root cause.


Solution

  • How long is your Glue job running for?

    I found that jobs with short execution times, less then or around 1 min do not reliably produce Spark UI logs in S3.

    The AWS documentation states "Every 30 seconds, AWS Glue flushes the Spark event logs to the Amazon S3 path that you specify." the reason short jobs do not produce Spark UI logs probably has something to do with this.

    If you have a job with a short execution time try adding additional steps to the job or even a pause/wait to length the execution time. This should help ensure that the Spark UI logs are sent to S3.