amazon-web-servicesapache-sparkamazon-emrlapackblas

Spark with OpenBLAS on EMR


I keep getting the infamous warning when trying to run the MlLib ALS algorithm in Spark 2.1.0 on an EMR instance:

WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS

I managed to resolve the issue on my local Ubuntu machine by rebuilding Spark to include netlib-java with the -Pnetlib-lgpl flag, but is there a way to avoid rebuilding the default EMR Spark build? Currently I'm trying to circumvent it by building a fat JAR with sbt-assembly, adding the following dependency:

libraryDependencies += "com.github.fommil.netlib" % "all" % "1.1.2"

Although assembly is successful, the BLAS warnings still don't go away when running spark-submit. I have openblas and lapack installed on the EMR.


Solution

  • Okay so it seems impossible to do with a fat JAR, so I built a custom distribution of Spark as follows:

    export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
    ./dev/make-distribution.sh --name spark --tgz -Phadoop-2.7 -Phive -Phive-thriftserver -Pnetlib-lgpl -Pkinesis-asl -Pspark-ganglia-lgpl
    

    and replaced the /usr/lib/spark directory on the EMR with my build. That did the trick.