apache-sparkpartitioner

type HashPartitioner is not a member of org.apache.spark.sql.SparkSession


I was using spark-shell to experiment with Spark's HashPartitioner. The error is shown as follows:

scala> val data = sc.parallelize(List((1, 3), (2, 4), (3, 6), (3, 7)))
data: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:24

scala> val partitionedData = data.partitionBy(new spark.HashPartitioner(2))
<console>:26: error: type HashPartitioner is not a member of org.apache.spark.sql.SparkSession
       val partitionedData = data.partitionBy(new spark.HashPartitioner(2))
                                                        ^

scala> val partitionedData = data.partitionBy(new org.apache.spark.HashPartitioner(2))
partitionedData: org.apache.spark.rdd.RDD[(Int, Int)] = ShuffledRDD[1] at partitionBy at <console>:26

The second operation failed while the third operation worked. Why would spark-shell look for spark.HashPartitioner in the package of org.apache.spark.sql.SparkSession instead of org.apache.spark?


Solution

  • spark is a SparkSession object not org.apache.spark package.

    You should import org.apache.spark.HashPartitioner or use the full class name, for example:

    import org.apache.spark.HashPartitioner
    
    val partitionedData = data.partitionBy(new HashPartitioner(2))