scalaapache-sparkpysparkjython

Invoking Pyspark script from Scala Spark Code


I have a Scala Spark application and want to invoke pySpark/python (pyspark_script.py) for further processing.

There are multiple resources to use Java/Scala code in Python but I am looking for scala->Pyspark

I explored Jython for Scala/Java to include Python code as follows:

PythonInterpreter.initialize(System.getProperties, properties, sysArgs)
val pi = new PythonInterpreter()
pi.execfile("path/to/pyscript/mypysparkscript.py")

I see error that says: "ImportError: No module named pyspark"

Is there any way on how Scala spark can talk to PYSpark with same sparkContext/session?


Solution

  • You can run shell commands in scala using process object.

    // Spark codes goes here .....
    // Call pyspark code 
    import sys.process._
    "python3 /path/to/python/file.py.!!
    

    To use same session add below line to python file.

    spark = SparkSession.builder.getOrCreate()
    

    You can use getActiveSession() method also.

    NOTE: Make sure you installed pyspark module. You can do that by using pip3 install pyspark command.