scalaapache-sparkimplicits

import implicit conversions without instance of SparkSession


My Spark-Code is cluttered with code like this

object Transformations {   
  def selectI(df:DataFrame) : DataFrame = {    
    // needed to use $ to generate ColumnName
    import df.sparkSession.implicits._

    df.select($"i")
  }
}

or alternatively

object Transformations {   
  def selectI(df:DataFrame)(implicit spark:SparkSession) : DataFrame = {    
    // needed to use $ to generate ColumnName
    import sparkSession.implicits._

    df.select($"i")
  }
}

I don't really understand why we need an instance of SparkSession just to import these implicit conversions. I would rather like to do something like :

object Transformations {  
  import org.apache.spark.sql.SQLImplicits._ // does not work

  def selectI(df:DataFrame) : DataFrame = {    
    df.select($"i")
  }
}

Is there an elegant solution for this problem? My use of the implicits is not limited to $ but also Encoders, .toDF() etc.


Solution

  • I don't really understand why we need an instance of SparkSession just to import these implicit conversions. I would rather like to do something like

    Because every Dataset exists in a scope of specific SparkSession and a single Spark application can have multiple active SparkSession.

    Theoretically some of the SparkSession.implicits._ could exist separately from the session instance like:

    import org.apache.spark.sql.implicits._   // For let's say `$` or `Encoders`
    import org.apache.spark.sql.SparkSession.builder.getOrCreate.implicits._  // For toDF
    

    but it would have a significant impact on the user code.