apache-sparkhortonworks-data-platformconnectorapache-atlas

spark-atlas-connector: "SparkCatalogEventProcessor-thread" class not found exception


After following the instructions for spark-atlas-connector. I am getting below error while running simple code to create table in spark

Spark2 2.3.1 Atlas 1.0.0

batch cmd is:

spark-submit --jars /home/user/spark-atlas-connector/spark-atlas-connector-assembly/target/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar
--conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker
--files /home/user/atlas-application.properties
--master local
/home/user/SparkAtlas/test.py

Exception in thread "SparkCatalogEventProcessor-thread" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener at com.hortonworks.spark.atlas.sql.SparkCatalogEventProcessor.process(SparkCatalogEventProcessor.scala:36) at com.hortonworks.spark.atlas.sql.SparkCatalogEventProcessor.process(SparkCatalogEventProcessor.scala:28) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71) at scala.Option.foreach(Option.scala:257) at com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anon$1.run(AbstractEventProcessor.scala:38) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

Thanks in advance.


Solution

  • This is clear indication of jar version mismatches

    for the latest atlas version 2.0.0... below are the dependencies

      <spark.version>2.4.0</spark.version>
        <atlas.version>2.0.0</atlas.version>
        <scala.version>2.11.12</scala.version>
    

    For Atlas 1.0.0 see the pom.xml for it... these are dependencies

     <spark.version>2.3.0</spark.version>
        <atlas.version>1.0.0</atlas.version>
        <scala.version>2.11.8</scala.version>
    

    try using the correct versions of jars by seeinng the pom.xml mentioned in the link.

    Note :
    1) if you add one jar by seeing error and downloading it... and another place you will hit road block. Advise you to use correct versions.
    2) Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.1 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x). check your scala version as you have not mentioned in the question.