amazon-web-servicesapache-sparkamazon-data-pipeline

How to catch Spark error from shell script


I have a pipeline in AWS Data Pipeline that runs a shell script named shell.sh:

$ spark-submit transform_json.py


Running command on cluster...
[54.144.10.162] Running command...
[52.206.87.30] Running command...
[54.144.10.162] Command complete.
[52.206.87.30] Command complete.
run_command finished in 0:00:06.

The AWS Data Pipeline console says the job is "FINISHED", but in the stderr log I see that the job was actually aborted:

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: xxxxx, AWS Error Code: null, AWS Error Message: Not Found...        
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows.
    ...
        20/05/22 11:42:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
        20/05/22 11:42:47 INFO MemoryStore: MemoryStore cleared
        20/05/22 11:42:47 INFO BlockManager: BlockManager stopped
        20/05/22 11:42:47 INFO BlockManagerMaster: BlockManagerMaster stopped
        20/05/22 11:42:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
        20/05/22 11:42:47 INFO SparkContext: Successfully stopped SparkContext
        20/05/22 11:42:47 INFO ShutdownHookManager: Shutdown hook called

I'm somewhat new to data pipeline and Spark; can't wrap my head around what's actually happening behind the scene. How do I get the shell script to catch the SparkException?


Solution

  • try like this below example ...

    your shell script can catch error code like this... where non zero exit code is error

    $? is the exit status of the most recently executed command; by convention, 0 means success and anything else indicates failure.

    
    spark-submit transform_json.py
    
    
     ret_code=$?
       if [ $ret_code -ne 0 ]; then 
          exit $ret_code
       fi
    
    

    You have to code to return exit code by sys.exit(-1) in error condition. check this for python exception handling...

    Check this Exit codes in Python