amazon-web-servicespysparkaws-glue

ModuleNotFoundError in AWS Glue when importing pyspark.errors


PySpark script I'm using in my Glue job imports below line, need to assert schema between incoming frame and target S3:

from pyspark.errors.exceptions.base import PySparkAssertionError

Locally everything works fine but when uploading to AWS, Glue throws an error:

Error Category: IMPORT_ERROR; Failed Line Number: 9; ModuleNotFoundError: No module named 'pyspark.errors'

Should this be imported as a seperate jar file in such case? Would be odd as I expect PySpark in Glue being fully equiped in all modules.

Thanks in advance!


Solution

  • AWS Glue 4.0 uses Spark 3.3 which doesn't have support for pyspark.errors yet: https://archive.apache.org/dist/spark/docs/3.3.0/api/python/reference/index.html

    In case you really need an up-to-date Spark version, I'd recommend using EMR Serverless