scalaapache-sparkapache-spark-datasetapache-spark-encoders

Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?


Spark 2.0 (final) with Scala 2.11.8. The following super simple code yields the compilation error Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

import org.apache.spark.sql.SparkSession

case class SimpleTuple(id: Int, desc: String)

object DatasetTest {
  val dataList = List(
    SimpleTuple(5, "abc"),
    SimpleTuple(6, "bcd")
  )

  def main(args: Array[String]): Unit = {
    val sparkSession = SparkSession.builder.
      master("local")
      .appName("example")
      .getOrCreate()

    val dataset = sparkSession.createDataset(dataList)
  }
}

Solution

  • Spark Datasets require Encoders for data type which is about to be stored. For common types (atomics, product types) there is a number of predefined encoders available but you have to import these first from SparkSession.implicits to make it work:

    val sparkSession: SparkSession = ???
    import sparkSession.implicits._
    val dataset = sparkSession.createDataset(dataList)
    

    Alternatively you can provide directly an explicit

    import org.apache.spark.sql.{Encoder, Encoders}
    
    val dataset = sparkSession.createDataset(dataList)(Encoders.product[SimpleTuple])
    

    or implicit

    implicit val enc: Encoder[SimpleTuple] = Encoders.product[SimpleTuple]
    val dataset = sparkSession.createDataset(dataList)
    

    Encoder for the stored type.

    Note that Encoders also provide a number of predefined Encoders for atomic types, and Encoders for complex ones, can derived with ExpressionEncoder.

    Further reading: