PySpark script I'm using in my Glue job imports below line, need to assert schema between incoming frame and target S3:
from pyspark.errors.exceptions.base import PySparkAssertionError
Locally everything works fine but when uploading to AWS, Glue throws an error:
Error Category: IMPORT_ERROR; Failed Line Number: 9; ModuleNotFoundError: No module named 'pyspark.errors'
Should this be imported as a seperate jar
file in such case?
Would be odd as I expect PySpark in Glue being fully equiped in all modules.
Thanks in advance!
AWS Glue 4.0 uses Spark 3.3 which doesn't have support for pyspark.errors yet: https://archive.apache.org/dist/spark/docs/3.3.0/api/python/reference/index.html
In case you really need an up-to-date Spark version, I'd recommend using EMR Serverless