pythonapache-sparkpysparkgoogle-cloud-dataprocfpgrowth

Unable to import org module to PySpark cluster


I am trying to import FPGrowth from org module but it throws an error while installing the org module. I also tried replacing org.apache.spark to pyspark, still doesn't work.

!pip install org
import org.apache.spark.ml.fpm.FPGrowth

below is the error:

ERROR: Could not find a version that satisfies the requirement org (from versions: none)
ERROR: No matching distribution found for org
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-12-c730562e7076> in <module>
      1 get_ipython().system('pip install org')
----> 2 import org.apache.spark.ml.fpm.FPGrowth

ModuleNotFoundError: No module named 'org'

Solution

  • To import FPGrowth in PySpark you need to write:

    from pyspark.ml.fpm import FPGrowth
    

    You can find additional instructions on how to use FPGrowth in Spark documentation.