dataframescalaapache-sparksql-function

Find the least value in between the columns using Spark DataFrame


I have a dataframe like below and need to find the least value except zeros and add it in a new column as 'Least'.

Column1 Column2 Column3
100.0 120.0 150.0
200.0 0.0 0.0
0.0 20.0 100.0

I tried with least() function but I didn't get the expected output.

expected output would be like below.

Column1 Column2 Column3 Least
100.0 120.0 150.0 100.0
200.0 0.0 0.0 200.0
0.0 20.0 100.0 20.0

Solution

  • You can do something like this to get the least values

    import sparkSession.implicits._
    
          val df = List(
            (100.0, 120.0, 150.0),
            (200.0, 0.0, 0.0),
            (0.0, 20.0, 100.0)
          ).toDF("column1", "column2", "column3")
          
          val columns = df.columns.toSeq
          
          
          val leastRow = least(
            columns map col: _*
          ).alias("min")
    
          df.select($"*", leastRow).show
    

    Try to improve the leastRow method to ignore the zero values. think about replacing the zero values with the maximum possible float value in your use case, Double.PositiveInfinity in general ect.. Do not hesitate to post your work and be sure that you'll get help ! Good luck.