timescaledb real-time-data continuous-aggregates

Real-Time aggregation not up to date

I am experiencing real time aggregation not to be up to date in real time. Is there something I am missing?

A reproducible example on version 2.4.2 using the current docker image timescale/timescaledb:latest-pg12:

CREATE TABLE data
(
    time  TIMESTAMPTZ      NOT NULL,
    value DOUBLE PRECISION NOT NULL
);

SELECT create_hypertable('data', 'time', chunk_time_interval => interval '1d');

INSERT INTO data (time, value)
VALUES ('2020-01-01', 100);

CREATE MATERIALIZED VIEW data_daily WITH (timescaledb.continuous)
AS
SELECT time_bucket('1 day', time) AS time,
       avg(value)                 AS avg,
       count(*)                   AS count
FROM data
GROUP BY 1;
ALTER MATERIALIZED VIEW data_daily SET (timescaledb.materialized_only = false);

Now when I run SELECT * FROM data_daily I get the expected result:

time, avg, count
2020-01-01 00:00:00.000000, 100, 1

But after inserting another value and running the query again, it does not update. The result is the same as above.

INSERT INTO data (time, value) VALUES ('2020-01-01', 150);

SELECT * FROM data_daily;

Output:

time, avg, count
2020-01-01 00:00:00.000000, 100, 1

Refreshing manually and then querying again will show the expected result.

CALL refresh_continuous_aggregate('data_daily', '1900-01-01', '2100-01-01');

SELECT * FROM data_daily;

Output:

time, avg, count
2020-01-01 00:00:00.000000, 125, 2

Is there anything else that needs to be configured for real time aggregation to work? From the documentation I understand that setting materialized_only = false should be enough (and not even necessary as it is the default).

For reference, this is the query plan after the second insert and before the manual refresh:

 Append  (cost=0.15..59.98 rows=400 width=24) (actual time=0.138..0.200 rows=1 loops=1)
   ->  GroupAggregate  (cost=0.15..21.76 rows=200 width=24) (actual time=0.130..0.151 rows=1 loops=1)
         Group Key: _materialized_hypertable_48."time"
         ->  Custom Scan (ChunkAppend) on _materialized_hypertable_48  (cost=0.15..16.81 rows=260 width=72) (actual time=0.021..0.046 rows=1 loops=1)
               Order: _materialized_hypertable_48."time"
               Chunks excluded during startup: 0
               ->  Index Scan Backward using _hyper_48_315_chunk__materialized_hypertable_48_time_idx on _hyper_48_315_chunk  (cost=0.15..16.81 rows=260 width=72) (actual time=0.014..0.023 rows=1 loops=1)
                     Index Cond: ("time" < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(48)), '-infinity'::timestamp with time zone))
   ->  GroupAggregate  (cost=0.16..32.23 rows=200 width=24) (actual time=0.010..0.021 rows=0 loops=1)
         Group Key: (time_bucket('1 day'::interval, data."time"))
         ->  Custom Scan (ChunkAppend) on data  (cost=0.16..24.60 rows=617 width=16) (actual time=0.003..0.007 rows=0 loops=1)
               Order: time_bucket('1 day'::interval, data."time")
               Chunks excluded during startup: 1
 Planning Time: 4.978 ms
 Execution Time: 0.384 ms

Solution

This is a good question, it's definitely a bit of a confusing bit in the way continuous aggregates work.

The real time view only works on regions of the view that have not yet been materialized at all, it doesn't work on regions that have been materialized but are now invalidated. This is for predictability of performance reasons and because of the way that materialization and invalidation work. Usually the refresh window is called on some time less than now(), say now() - '1 hour'::interval then inserts happen from 1 hour ago forward, then the real time view will run the query directly on the underlying table on the region now()-'1 hour' -> now() and return the results from the materialized portion for the region before that. There could be lots of little regions that are invalidated so those will only get picked up on the next run of the materialization job. You could say it is an eventually consistent view of your data.

Now for you, I'd say the big thing is to run the refresh procedure not so far into the future but rather stick to the past and then you will see the real time view work more the way you expect.