javaapache-sparkmaster-slavehail

hail.utils.java.FatalError: IllegalStateException: unread block data


I am trying to run a basic script on spark cluster that takes in a file, converts it and outputs in different format. The spark cluster at the moment consists of 1 master and 1 slave both running on the same node. The full command is:

nohup spark-submit --master spark://tr-nodedev1:7077 --verbose --conf spark.driver.port=40065 --driver-memory 4g --conf spark.driver.extraClassPath=/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar
--conf spark.executor.extraClassPath=./hail-all-spark.jar ./hail_scripts/v02/convert_vcf_to_hail.py /clinvar_37.vcf -ht
--genome-version 37 --output /seqr-reference-hail2/clinvar_37.ht &

And it gives an error:

hail.utils.java.FatalError: IllegalStateException: unread block data

More detailed stack trace can be found on another forum where I asked the same question:

https://discuss.hail.is/t/unread-block-data-error-spark-master-slave-issue/1182

Such command works fine:

nohup spark-submit --conf spark.driver.extraClassPath=/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar 
--conf spark.executor.extraClassPath=./hail-all-spark.jar ./hail_scripts/v02/convert_vcf_to_hail.py /hgmd_pro_2019.3_hg19_noDB.vcf -ht 
--genome-version 37 --output /seqr-reference-hail2/hgmd_2019.3_hg19_noDB.ht &

So, in local mode it runs fine, but in standalone it's not. So, I guess it is the issue of master-slave different settings, possibly JAVA. However, setting them in spark-env.sh like that:

export JAVA_HOME=/usr/lib/jvm/java

export SPARK_JAVA_OPTS+=" -Djava.library.path= $SPARK_LIBRARY_PATH : $JAVA_HOME "

Does not fix the issue. To start master + slave I just use start-all.sh script. Any suggestions would be greatly appreciated.


Solution

  • Ok, we fixed it and the solution was to add the following setting to our command that runs the script:

    –jars /opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar
    

    So, the working command is the following:

    spark-submit --master spark://ai-grisnodedev1:7077 --verbose --conf spark.driver.port=40065 --driver-memory 4g --conf spark.driver.extraClassPath=/opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar --conf spark.executor.extraClassPath=./hail-all-spark.jar --jars /opt/seqr/.conda/envs/py37/lib/python3.7/site-packages/hail/hail-all-spark.jar test_hail.py
    

    For future Hail 0.2 users may be important to know that this --jars parameter is required to specify, and that it should point to hail-all-spark.jar.