I have this DataFrame :
+----+------+--------------------+--------+-------------+
| id | name | end time | value | comment |
---------------------------------------------------------
|1 |node1 |2017-03-24 08:30:00 | 5 | blabla |
---------------------------------------------------------
|2 |node1 |2017-03-24 09:00:00 | 3 | blabla |
---------------------------------------------------------
|3 |node1 |2017-03-24 09:30:00 | 8 | blabla |
---------------------------------------------------------
|4 |node2 |2017-03-24 10:00:00 | 5 | blabla |
---------------------------------------------------------
|5 |node2 |2017-03-24 10:30:00 | 3 | blabla |
---------------------------------------------------------
|6 |node2 |2017-03-24 11:00:00 | 1 | blabla |
---------------------------------------------------------
|7 |node2 |2017-03-24 11:30:00 | 3 | blabla |
---------------------------------------------------------
|8 |node2 |2017-03-24 12:00:00 | 5 | blabla |
---------------------------------------------------------
And I need to find nodes with value less than 6 during 2 hours. How I can do it in Spark 1.6? Thanks in advance!
EDIT : This is only in Spark 2.x
You can use window aggregate functions :
df.groupBy(
col("name"),
window(col("end time"), "2 hour", "30 minute")
)
.agg(max("value").as("2_hour_max_value"))
.filter(col("2_hour_max_value") < 6)
.select("name")
.distinct()