apache-sparkamazon-deequ

Unable to run amazon deequ examples locally


I am trying to run and test amazon deequ library locally but am repeatedly getting the class not found error for various examples. exact error

    java.lang.NoClassDefFoundError: scala/Product$class
  at com.amazon.deequ.profiles.ColumnProfilerRunBuilderFileOutputOptions.<init>(ColumnProfilerRunner.scala:31)
  at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:174)
  ... 47 elided
Caused by: java.lang.ClassNotFoundException: scala.Product$class
  at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466)
  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:566)
  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:499)
  ... 49 more

or

val suggestionResult = ConstraintSuggestionRunner().onData(input).addConstraintRules(Rules.DEFAULT).run()
java.lang.NoClassDefFoundError: scala/Product$class
  at com.amazon.deequ.suggestions.rules.CompleteIfCompleteRule.<init>(CompleteIfCompleteRule.scala:25)
  at com.amazon.deequ.suggestions.Rules$.<init>(ConstraintSuggestionRunner.scala:33)
  at com.amazon.deequ.suggestions.Rules$.<clinit>(ConstraintSuggestionRunner.scala)
  ... 49 elided

the code i followed is the one given in the examples I used spark-submit --class --packages com.amazon.deequ:deequ:1.0.4

i also tried using spark-shell --jars and tried running lines one by one but still getting same result


Solution

  • The version of Deequ that you're using doesn't work with Spark 3.0 that is compiled with Scala 2.12, so it's causing this error (as pointed by Philipp). So you have two possible solutions:

    1. Use Spark 2.4.x that is compatible with Deequ 1.0.4
    2. Compile Deequ from sources with following command: mvn clean install -DskipTests -Pscala-2.12 -Pspark-3.0, and then you can use it with spark-shell as: bin/spark-shell --jars <path-to-deequ-checkout>/target/deequ_2.12-1.1.0-SNAPSHOT.jar (unfortunately we can't use --packages because of build problem in Maven)

    P.S. It's better to grab latest Spark -> 3.0.1 - preview version was released too long ago