
ModuleNotFoundError in AWS Glue when importing pyspark.errors

PySpark script I'm using in my Glue job imports below line, need to assert schema between incoming frame and target S3:

from pyspark.errors.exceptions.base import PySparkAssertionError

Locally everything works fine but when uploading to AWS, Glue throws an error:

Error Category: IMPORT_ERROR; Failed Line Number: 9; ModuleNotFoundError: No module named 'pyspark.errors'

Should this be imported as a seperate jar file in such case? Would be odd as I expect PySpark in Glue being fully equiped in all modules.

Thanks in advance!


  • AWS Glue 4.0 uses Spark 3.3 which doesn't have support for pyspark.errors yet:

    In case you really need an up-to-date Spark version, I'd recommend using EMR Serverless