I'm trying to run a distributed Kmeans using a distributed Kmeans of Spark MLLIB and I'm getting the following error:
Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
I'm using scala 2.13.0 and spark 3.3.0. and breeze 2.1.0 Does anyone know how to solve it?
Looks like an issue with dependencies.
In Breeze 1.3- breeze.storage.Zero.DoubleZero
was defined as
@SerialVersionUID(1L)
implicit object DoubleZero extends Zero[Double] {
override def zero = 0.0
}
and breeze.storage.Zero.DoubleZero.getClass
produced breeze.storage.Zero$DoubleZero$
.
But in Breeze 2.0+ DoubleZero
is defined as
implicit val DoubleZero: Zero[Double] = Zero(0.0)
@SerialVersionUID(1L)
case class Zero[@specialized T](zero: T) extends Serializable
and breeze.storage.Zero.DoubleZero.getClass
produces breeze.storage.Zero$mcD$sp
(because of @specialized
) while Class.forName("breeze.storage.Zero$DoubleZero$")
throws ClassNotFoundException
.
You should look what dependency still uses Breeze 1.3-
Update. Thanks for MCVE.
Debugging shows that NoClassDefFoundError
/ClassNotFoundException
is thrown here
private lazy val loadableSparkClasses: Seq[Class[_]] = {
Seq(
// ...
"org.apache.spark.ml.linalg.SparseMatrix", // <---
// ...
).flatMap { name =>
try {
Some[Class[_]](Utils.classForName(name)) // <---
} catch {
case NonFatal(_) => None // do nothing
case _: NoClassDefFoundError if Utils.isTesting => None // See SPARK-23422.
}
}
}
Simpler reproduction is
Class.forName("org.apache.spark.ml.linalg.SparseMatrix")
// java.lang.NoClassDefFoundError: breeze/storage/Zero$DoubleZero$ ...
// Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$ ...
As I said, one of dependencies uses Breeze 1.3- although you're thinking that you're using Breeze 2.1.0. Namely, org.apache.spark.ml.linalg.SparseMatrix
is from spark-mllib-local
and spark-mllib-local
3.3.0 uses Breeze 1.2
<dependency>
<groupId>org.scalanlp</groupId>
<artifactId>breeze_2.13</artifactId>
<version>1.2</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<artifactId>commons-math3</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
</exclusions>
</dependency>
So Spark 3.3.0 (and 3.3.2) is incompatible with Breeze 2.0+. Use Breeze 1.3-
scalaVersion := "2.13.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "3.3.0",
"org.apache.spark" %% "spark-mllib" % "3.3.0",
"org.scalanlp" %% "breeze" % "1.3"
)
Then your code runs successfully.
Compatibility issues between different versions of Spark and Breeze are not rare:
https://github.com/scalanlp/breeze/issues/710
https://github.com/scalanlp/breeze/issues/690
Breeze should be upgraded to 2.0 in Spark 3.4.0
https://issues.apache.org/jira/browse/SPARK-39616
Meanwhile you can try it with the following build.sbt
scalaVersion := "2.13.0"
resolvers += "apache-repo" at "https://repository.apache.org/content/groups/snapshots"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "3.4.0-SNAPSHOT",
"org.apache.spark" %% "spark-mllib" % "3.4.0-SNAPSHOT",
"org.scalanlp" %% "breeze" % "2.1.0"
)
Then your code runs successfully too.