databasetime-seriesquestdb

How to compute and expand average power over time?


I'm working with a dataset where an IoT device sends a wh (watt-hour) value at discrete timestamps, identified by an operationId. I want to visualize this data by plotting the sum of average power per operation, broken down by hour.

Here is the raw data:

timestamp operationId wh
2025-04-01T14:10:59.000000Z 1001 0
2025-04-01T14:20:01.000000Z 1002 0
2025-04-01T15:06:29.000000Z 1003 0
2025-04-01T18:18:05.000000Z 1001 200
2025-04-01T20:06:36.000000Z 1003 200
2025-04-01T22:20:10.000000Z 1002 300

Compute mean power per operation

I'm using lag() with a window function to get the previous wh and timestamp, and compute the average power:

SELECT
    timestamp AS end_time,
    cast(prev_ts AS timestamp) AS start_time,
    operationId,
    (wh - prev_wh) / ((cast(timestamp AS DOUBLE) - prev_ts) / 3600000000.0) AS mean_power_w
FROM (
    SELECT
        timestamp,
        wh,
        operationId,
        lag(wh) OVER (PARTITION BY operationId ORDER BY timestamp) AS prev_wh,
        lag(cast(timestamp AS DOUBLE)) OVER (PARTITION BY operationId ORDER BY timestamp) AS prev_ts
    FROM meter
)
WHERE prev_ts IS NOT NULL
ORDER BY timestamp;

This gives me:

end_time start_time mean_power_w operationId
2025-04-01T18:18:05.000000Z 2025-04-01T14:10:59.000000Z 48.563334682314 1001
2025-04-01T20:06:36.000000Z 2025-04-01T15:06:29.000000Z 39.984450491475 1003
2025-04-01T22:20:10.000000Z 2025-04-01T14:20:01.000000Z 37.488284910965 1002

Expand the value across hours

Now I’d like to expand each row to hourly buckets between start_time and end_time, filling each hour with the cumulative energy based on the mean power.

For our example:

ts operationId energy
2025-04-01T14:00:00.000Z 1001 39.67354647241263
2025-04-01T14:00:00.000Z 1002 24.981776528168066
2025-04-01T15:00:00.000Z 1001 48.563334682314
2025-04-01T15:00:00.000Z 1003 35.66390848003506
2025-04-01T15:00:00.000Z 1002 37.488284910965
2025-04-01T16:00:00.000Z 1001 48.563334682314
2025-04-01T16:00:00.000Z 1003 39.984450491475
2025-04-01T16:00:00.000Z 1002 37.488284910965
2025-04-01T17:00:00.000Z 1001 48.563334682314
2025-04-01T17:00:00.000Z 1003 39.984450491475
2025-04-01T17:00:00.000Z 1002 37.488284910965
2025-04-01T18:00:00.000Z 1001 14.636449480641858
2025-04-01T18:00:00.000Z 1003 39.984450491475
2025-04-01T18:00:00.000Z 1002 37.488284910965
2025-04-01T19:00:00.000Z 1003 39.984450491475
2025-04-01T19:00:00.000Z 1002 37.488284910965
2025-04-01T20:00:00.000Z 1003 4.39828955406225
2025-04-01T20:00:00.000Z 1002 37.488284910965
2025-04-01T21:00:00.000Z 1002 37.488284910965
2025-04-01T22:00:00.000Z 1002 12.600229095074349

How can I expand the results between start_time and end_time per hour and fill each hour with the correct mean_power_w (converted to wh for that period)?

"Cumulative" might not be the right term here. What I’m actually trying to do is expand each operation’s average power output over time.

So for each operationId, I calculate the average power in W between two measurements, then I want to apply that average power to every hour between the start and end timestamps, converting it to Wh for each hour.

So the goal isn’t to sum up values cumulatively across time, but rather to spread the average energy contribution of an operation across each hour it spans.

Is there a way to achieve this using SAMPLE BY, joins, or maybe generate_series-like logic in QuestDB?


Solution

  • It took me a while to come with a suitable solution. I thought I had one (which was mega complex) but then I realised, speaking with a colleague, that this should be easier. And here it is

    
    with
    sampled AS ( 
      select timestamp, operationId, sum(wh) as wh from meter  
      sample by 1h
      FILL(0)
    ),
    sessions AS (
    select *, SUM(case when wh>0 THEN 1 END) OVER (partition by operationId order by timestamp desc) as session
    from sampled
    ), 
    counts AS (
    select timestamp, operationId, 
      FIRST_VALUE(wh) over (partition by operationId, session order by timestamp desc) as wh,
      COUNT(*) over (partition by operationId, session) as attributable_hours
    FROM sessions  
    )
    SELECT timestamp, operationId, wh/attributable_hours as wh_avg 
    --,wh, attributable_hours,
    FROM counts
    ;
    

    Basically what you need here is to slice, for each operationId, all the hours with no wh value (or 0) preceeding a row with a value for wh. Then we divide the wh for this row across all the rows in the same slice, to attribute the average. I am calling each of this slices a "session".

    My query starts with the `sampled` subquery, which just makes sure we have an entry for each operationId and missing hourly interval, filling with 0 wh all the intepolated rows.

    Now the real trick for the query. We divide the data in "sessions". A session is defined by all the rows with no value for wh before a row with a value for wh. Or, if we were to reverse the timestamp order, a session would be defined by a row with a value for wh, followed by several rows with zero value for the same operationId. We can express this with this window function:

    SUM(case when wh>0 THEN 1 END) OVER (partition by operationId order by timestamp desc) as session

    Now, for each operationId we will have multiple sessions (1,2,3...), and if we did something like...

    COUNT() as attributable_hours GROUP BY operationId, session

    ...we would get how many attributable rows each session has.

    With this info I can go to the next subquery that I named `counts` in which I am using a window function to COUNT the number of rows per session (notice how the count window function is not using order by so this will not be a running count, but all the rows for the same session will have the same value as `attributable_hours`.

    I am also getting the FIRST_VALUE for the session sorted by reverse timestamp, which is the `wh` value for the only row with value in each session.

    And then the latest query is just a simple division, to average the wh reported in the session across the number of attributable_hours.

    I left a line commented out in case you want to see more clearly the intermediate values for the calculation.

    Note if you want to filter the results by timestamp or operationId, you should ideally add the filter at the first query, the one named sampled, so the rest of the process is done on the relevant subset of data.