apache-sparkpysparkscala-spark

How to use when() .otherwise function in Spark with multiple conditions


This is my first post so let me know if I need to give more details.

I am trying to create a boolean column, "immediate", that shows true when at least on of the columns has some data in it. If all are null then the column should be false. I am using the when() .otherwise function in spark but I'm not getting the result I would expect.

Below is the code I'm using:

val evaluation = evaluation_raw
.withColumn("immediate", 
    when(col("intended_outcome_review").isNull 
    && col("outcome").isNull 
    && col("impact").isNull 
    && col("impact_self").isNull 
    && col("next_step").isNull, 
    lit(false))
    .otherwise(lit(true)))
.select(
    col("id"),
    col("intended_outcome_review"),
    col("outcome"),
    col("impact"),
    col("impact_self"),
    col("next_step"),
    col("immediate"))

Desired outcome:

+--------+------------------------+-------------+-------+------------+----------+----------+
|id      |intended_outcome_review |outcome      |impact |impact_self |next_step |immediate |
+--------+------------------------+-------------+-------+------------+----------+----------+
|1568    |null                    |null         |4      |3           |null      |true      |
|1569    |null                    |null         |null   |null        |null      |false     |
|1570    |null                    |null         |null   |null        |null      |false     |
|1571    |1                       |improved coms|3      |3           |email prof|true      |
+--------+------------------------+-------------+-------+------------+----------+----------+

Actual outcome:

+--------+------------------------+-------------+-------+------------+----------+----------+
|id      |intended_outcome_review |outcome      |impact |impact_self |next_step |immediate |
+--------+------------------------+-------------+-------+------------+----------+----------+
|1568    |null                    |null         |4      |3           |null      |true      |
|1569    |null                    |null         |null   |null        |null      |true      |
|1570    |null                    |null         |null   |null        |null      |false     |
|1571    |1                       |improved coms|3      |3           |email prof|true      |
+--------+------------------------+-------------+-------+------------+----------+----------+

If anyone knows what I may be doing wrong please let me know. Thanks!


Solution

  • Turns out some of the columns are converted from Null to "" when other parts of the form are filled out.

    Answer below considers empty strings and Null values:

    .withColumn("immediate", 
        when((col("intended_outcome_review").isNull || col("intended_outcome_review") ==="")
        && (col("outcome").isNull || col("outcome") === "")
        && (col("impact").isNull || col("outcome") === "")
        && (col("impact_self").isNull || col("impact_self") === "")
        && (col("next_step").isNull || col("next_step") === ""),
        lit(false))
        .otherwise(lit(true)))