apache-sparkwindowudf

window function for Spark 1.6


I have this DataFrame :

+----+------+--------------------+--------+-------------+ 
| id | name |    end time        |  value |    comment  |
---------------------------------------------------------
|1   |node1 |2017-03-24 08:30:00 |    5   | blabla      |
---------------------------------------------------------
|2   |node1 |2017-03-24 09:00:00 |    3   | blabla      |
---------------------------------------------------------
|3   |node1 |2017-03-24 09:30:00 |    8   | blabla      |
---------------------------------------------------------
|4   |node2 |2017-03-24 10:00:00 |    5   | blabla      |
---------------------------------------------------------
|5   |node2 |2017-03-24 10:30:00 |    3  | blabla      |
---------------------------------------------------------
|6   |node2 |2017-03-24 11:00:00 |    1   | blabla      |
---------------------------------------------------------
|7   |node2 |2017-03-24 11:30:00 |    3   | blabla      |
---------------------------------------------------------
|8   |node2 |2017-03-24 12:00:00 |    5  | blabla      |
---------------------------------------------------------

And I need to find nodes with value less than 6 during 2 hours. How I can do it in Spark 1.6? Thanks in advance!


Solution

  • EDIT : This is only in Spark 2.x

    You can use window aggregate functions :

    df.groupBy(
        col("name"),
        window(col("end time"), "2 hour", "30 minute")
    )
    .agg(max("value").as("2_hour_max_value"))
    .filter(col("2_hour_max_value") < 6)
    .select("name")
    .distinct()