apache-sparkapache-zeppelin

Zeppelin+Spark+Cassandra: Spark dont work


Watched one nice youtube video about Zeppelin+Spark+Cassandra. Trying to repeat. OS Win10.

  1. Runned Zeppelin like a docker image ;

  2. Setuped options for Cassandra Interpreters, it works

  3. Now trying to setup Spark, and i cant. Installed spark-3.0.1-bin-hadoop2.7 (folder named spark-3.0.1-bin-hadoop2.7, it is ok), spark-shell from cmd works. What i have to do with spark-cassandra-connector and what options i have to setup for spark Interpreters? Thanks.

org.apache.zeppelin.interpreter.InterpreterException: java.io.IOException: Fail to detect scala version, the reason is:Cannot run program "C:/bin/spark-3.3.1-bin-hadoop3/bin/spark-submit": error=2, No such file or directory at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:129) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:271) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:438) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:69) at org.apache.zeppelin.scheduler.Job.run(Job.java:172) at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:182) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Fail to detect scala version, the reason is:Cannot run program "C:/bin/spark-3.3.1-bin-hadoop3/bin/spark-submit": error=2, No such file or directory at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.buildEnvFromProperties(SparkInterpreterLauncher.java:127) at org.apache.zeppelin.interpreter.launcher.StandardInterpreterLauncher.launchDirectly(StandardInterpreterLauncher.java:77) at org.apache.zeppelin.interpreter.launcher.InterpreterLauncher.launch(InterpreterLauncher.java:110) at org.apache.zeppelin.interpreter.InterpreterSetting.createInterpreterProcess(InterpreterSetting.java:856) at org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:66) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:104) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:154) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:126) ... 13 more


Solution

  • Ok guys, here we go:

    1. Install Spark on Win10, there are many tutorials in internet. My version 3.0.1
    2. Download docker image with Zeppelin
    3. In image settings setuped path folder with Spark and port 8080, lounch it http://localhost:8080/
    4. Spark interpreter settings: set SPARK_HOME like in prev point 3, spark.jars.packages = com.datastax.spark:spark-cassandra-connector_2.12:3.0.1. Add settings for Cassandra: spark.cassandra.connection.host, spark.cassandra.auth.username, spark.cassandra.auth.password.
    5. Welcome