After following the instructions for spark-atlas-connector. I am getting below error while running simple code to create table in spark
Spark2 2.3.1 Atlas 1.0.0
batch cmd is:
spark-submit --jars /home/user/spark-atlas-connector/spark-atlas-connector-assembly/target/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar
--conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker
--files /home/user/atlas-application.properties
--master local
/home/user/SparkAtlas/test.py
Exception in thread "SparkCatalogEventProcessor-thread" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener at com.hortonworks.spark.atlas.sql.SparkCatalogEventProcessor.process(SparkCatalogEventProcessor.scala:36) at com.hortonworks.spark.atlas.sql.SparkCatalogEventProcessor.process(SparkCatalogEventProcessor.scala:28) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71) at scala.Option.foreach(Option.scala:257) at com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anon$1.run(AbstractEventProcessor.scala:38) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Thanks in advance.
This is clear indication of jar version mismatches
for the latest atlas version 2.0.0... below are the dependencies
<spark.version>2.4.0</spark.version>
<atlas.version>2.0.0</atlas.version>
<scala.version>2.11.12</scala.version>
For Atlas 1.0.0 see the pom.xml for it... these are dependencies
<spark.version>2.3.0</spark.version>
<atlas.version>1.0.0</atlas.version>
<scala.version>2.11.8</scala.version>
try using the correct versions of jars by seeinng the pom.xml mentioned in the link.
Note :
1) if you add one jar by seeing error and downloading it... and another place you will hit road block. Advise you to use correct versions.
2) Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.1 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x). check your scala version as you have not mentioned in the question.