pythonapache-sparkh2osparkling-waterh2o.ai

I am getting error while defining H2OContext in python spark script


Code:

from pyspark.sql import SparkSession
from pysparkling import *

hc = H2OContext.getOrCreate()

I am using spark standalone cluster 3.2.1 and try to initiate H2OContext in python file. while trying to run the script using spark-submit, i am getting following error:

hc = H2OContext.getOrCreate() NameError: name 'H2OContext' is not defined

Spark-submit command:

spark-submit --master spark://local:7077 --packages ai.h2o:sparkling-water-package_2.12:3.36.1.3-1-3.2 spark_h20/h2o.py


Solution

  • The parameter --packages ai.h2o:sparkling-water-package_2.12:3.36.1.3-1-3.2 downloads a jar artifact from Maven. This artifact could be used only for Scala/Java. I see there is a mistake in Sparkling Water documentation.

    If you want to use Python API, you need to: