scalaapache-sparkapache-spark-datasetapache-spark-encoders

Spark Scala Dataset Type Hierarchy


Trying to enforce classes that extend W to have a method get that returns a Dataset of a subclass of a WR.

abstract class WR

case class TGWR(
          a: String,
          b: String
        ) extends WR

abstract class W {

  def get[T <: WR](): Dataset[T]

}


class TGW(sparkSession: SparkSession) extends W {

  override def get[TGWR](): Dataset[TGWR] = {
    import sparkSession.implicits._

    Seq(TGWR("dd","dd").toDF().as[TGWR]
  }

}

Compilation error:

Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.

If I change the get function to following:

  def get(): Dataset[TGWR]

and

  override def get(): Dataset[TGWR] = {...

it compiles - therefore I suspect a problem due to inheritance/type hierarchy.


Solution

  • Forget my comment, I re-read your question and noticed a simple problem.

    Here override def get[TGWR] you are not saying that this class produces instances of TGWR, but you are creating a new type parameter of name TGWR, that will shadow your real type.
    I fixed it with the following code:

    import org.apache.spark.sql.{SparkSession, Dataset}
    
    abstract class WR extends Product with Serializable
    
    final case class TGWR(a: String, b: String) extends WR
    
    abstract class W[T <: WR] {
      def get(): Dataset[T]
    }
    
    final class TGW(spark: SparkSession) extends W[TGWR] {
      override def get(): Dataset[TGWR] = {
        import spark.implicits._
        Seq(TGWR("dd","dd")).toDF().as[TGWR]
      }
    }
    

    That you can use right this:

    val spark = SparkSession.builder.master("local[*]").getOrCreate()
    (new TGW(spark)).get()
    // res1: org.apache.spark.sql.Dataset[TGWR] = [a: string, b: string]
    res1.show()
    // +---+---+
    // |  a|  b|
    // +---+---+
    // | dd| dd|
    // +---+---+
    

    Hope this is what you are looking for.
    Do not doubt to ask for clarification.