scalaapache-sparkfilterconditional-statementsin-clause

Multiple Filter condition in scala and in and not in clause filter


I am trying to do a filter similar to below using scala

where col1 = 'abc' and col2 not in (0,4) and col3 in (1,2,3,4)

I tried writing something like this

val finalDf: DataFrame = 
    initDf.filter(col("col1") ="abc")
          .filter(col("col2") <> 0)
          .filter(col("col2") <> 4)
          .filter(col("col3") = 1 ||col("col3") = 2 ||col("col3") = 3 ||col("col3") = 4)

or

val finalDf: DataFrame = 
     initDf.filter(col("col1") ="abc") 
     && col("col2") != 0 && col("col2") != 4 
     && (col("col3") = 1 
     || col("col3") = 2 
     || col("col3") = 3 
     || col("col3") = 4))

both not seems to be working. Can anyone help me on this.


Solution

  • For col operators are a little bit different

    For equality use ===

    For Inequality =!=

    If you want to use literals you can use lit function

    Your example may look like this

    dfMain.filter(col("col1") === lit("abc"))
              .filter(col("col2") =!= lit(0))
              .filter(col("col2") =!= lit(4))
              .filter(col("col3") === lit(1) || col("col3") === lit(2) ||col("col3") === lit(3) ||col("col3") === lit(4))
    

    You can also use isin instead of this filter with multiply ors

    If you want to find more about operators for cols you ca read this

    Medium blog post part1

    Medium blog post part2