pysparkzeplin

No module named 'pyspark' in Zeppelin


I am new to Spark and just started using it. Trying to import SparkSession from pyspark but it throws an error: 'No module named 'pyspark'. Please see my code below.

# Import our SparkSession so we can use it
from pyspark.sql import SparkSession
# Create our SparkSession, this can take a couple minutes locally
spark = SparkSession.builder.appName("basics").getOrCreate()```

Error:
```---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-6ce0f5f13dc0> in <module>
      1 # Import our SparkSession so we can use it
----> 2 from pyspark.sql import SparkSession
      3 # Create our SparkSession, this can take a couple minutes locally
      4 spark = SparkSession.builder.appName("basics").getOrCreate()

ModuleNotFoundError: No module named 'pyspark'``` 

I am in my conda env and I tried ```pip install pyspark``` but I already have it.

Solution

  • If you are using Zepl, they have their own specific way of importing. This makes sense, they need their own syntax since they are running in the cloud. It clarifies their specific syntax vs. Python itself. For instance %spark.pyspark.

    %spark.pyspark
    from pyspark.sql import SparkSession