I'm working with Impala and Hive, so I don't know if I have fancy functions that Oracle or MSQL provide
So , having a table of values and timestamps
|-----------|------------------|
| value | timestamp |
|-----------|------------------|
| 2 | 12:02:34 |
|-----------|------------------|
| 5 | 12:06:30 |
|-----------|------------------|
| 2 | 13:01:33 |
|-----------|------------------|
| 2 | 13:04:00 |
I want to get an extra column with the rolling average for the last hour only
|-----------|------------------|--------|
| value | timestamp | avg |
|-----------|------------------|--------|
| 2 | 12:02:34 | 2 |
|-----------|------------------|--------|
| 5 | 12:06:30 | 3.5 |
|-----------|------------------|--------|
| 2 | 13:01:33 | 3 |
|-----------|------------------|--------|
| 2 | 13:04:00 | 3 |
There can be a variable number of values for each hour, so I cannot do a Window over a number of rows, because i don't know how many rows it implies.
Any suggestions?
This is a little tricky. Although Hive supports range
window frames, it only supports them with numbers, not intervals.
So, you need to convert the timestamp to a number and then use that:
select t.*,
avg(value) over (order by unix_timestamp(timestamp)
range between 3559 preceding and current row
)
from t;
3559 = 60 * 60 - 1, which is one second less than an hour. One second less is used because the window frame includes the current row.