scalaapache-sparkapache-spark-sqlapache-spark-1.4

CaseWhen in spark DataFrame


I'd like to understand how to use the CaseWhen expressions with the new DataFrame api.

I can't see any reference to it in the documentation, and the only place I saw it was in the code: https://github.com/apache/spark/blob/v1.4.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L397

I'd like to be able to write something like this:

val col = CaseWhen(Seq(
    $"a" === lit(1), lit(10),
    $"a" === lit(2), lit(15),
    ...
    lit(20)
))

but this code won't compile because the Seq is of type Column and not Expression

What is the proper way to use CaseWhen?


Solution

  • To be honest, I don't know whether CaseWhen is intended to be used as a user facing API. Instead you should use the when and otherwise method of the Column type. With those methods you can construct a CaseWhen column.

    val column: Column = //some column
    
    val result: Column = column.
      when($"a" === functions.lit(1), 10).
      when($"a" === functions.lit(2), 15).
      otherwise(20)