timescaledb

Optimizing hierarchical continuous aggregates


Consider the following table:

CREATE TABLE IF aqs.measurement
(
    ts timestamp with time zone NOT NULL DEFAULT now(),
    value real,
    measurement_pt_id integer,
    tag_id integer NOT NULL,
    sensor_id integer NOT NULL,
    CONSTRAINT measurement_raw_pkey PRIMARY KEY (ts, sensor_id, tag_id),
    CONSTRAINT fk_measurement_pt_id FOREIGN KEY (measurement_pt_id)
        REFERENCES aqs.measurement_pt (id) MATCH SIMPLE,
    CONSTRAINT fk_sensor_id FOREIGN KEY (sensor_id)
        REFERENCES aqs.sensor (id) MATCH SIMPLE,
    CONSTRAINT fk_tag_name_id FOREIGN KEY (tag_id)
        REFERENCES aqs.tag_name (id) MATCH SIMPLE
)

Table contains values from a large number of sensors, some of which measures multiple parameters (physical values). Each parameter is uniquely identified by the combination of 'sensor_id' and 'tag_id'. Most sensors are sampled once per minute. There is also an associated hypertable.

Queries for long intervals (more than a few weeks) take a long time. Since the data will be explored in interactive graphs we need to downsample the data.

In order to speed up the queries when plotting the data for longer time periods I have created two (hierarchical) continuous aggregates to provide downsampled data plus some statistics:

CREATE MATERIALIZED VIEW val_hourly_agg
WITH (timescaledb.continuous) AS
SELECT
  time_bucket(INTERVAL '1 hour', "ts") AS ts_hour,
  sensor_id, measurement_pt_id, tag_id,
  avg(aqs.measurement.value)::real AS mean,
  max(aqs.measurement.value) AS max_val,
  min(aqs.measurement.value) AS min_val,
  stddev(stats_agg(aqs.measurement.value))::real AS std_dev,
  approx_percentile(0.1, percentile_agg(aqs.measurement.value))::real as p10,
  approx_percentile(0.9, percentile_agg(aqs.measurement.value))::real as p90,
  percentile_agg(aqs.measurement.value) as pct_agg,
  stats_agg(aqs.measurement.value) AS stats_agg
FROM aqs.measurement
GROUP BY ts_hour, sensor_id, measurement_pt_id, tag_id
;

CREATE MATERIALIZED VIEW val_daily_agg
WITH (timescaledb.continuous) AS
SELECT
  time_bucket(INTERVAL '1 day', "ts_hour") AS ts_day,
  sensor_id, measurement_pt_id, tag_id,
  average(rollup(stats_agg))::real AS mean,
  max(max_val) AS max_val,
  min(min_val) AS min_val,
  stddev(rollup(stats_agg))::real AS std_dev,
  approx_percentile(0.1, rollup(pct_agg))::real as p10,
  approx_percentile(0.9, rollup(pct_agg))::real as p90
FROM val_hourly_agg
GROUP BY ts_day, sensor_id, measurement_pt_id, tag_id
;

Since I am not very experienced with TimescaleDB I suspect that this could be done a lot more efficiently. Would appreciate if anyone could provide suggestions.

I am also unsure if the grouping of data is correct. Preliminary testing on real data seems to give reasonable results but need to create some synthetic test data to confirm. Comments on this are most welcome.


Solution

  • I'd suggest you start splitting the responsibilities and rethinking on what should be materialized and what should be persisted. Processing entire stats_agg, percentile_agg and all _agg functions can be very handy and then you can build several fast processing mechanisms over it.

    CREATE MATERIALIZED VIEW val_hourly_agg
    WITH (timescaledb.continuous) AS
    SELECT
      time_bucket(INTERVAL '1 hour', "ts") AS ts_hour,
      sensor_id, measurement_pt_id, tag_id,
      stats_agg(aqs.measurement.value),
      percentile_agg(aqs.measurement.value),
    FROM aqs.measurement
    GROUP BY ts_hour, sensor_id, measurement_pt_id, tag_id
    ;
    

    Then the daily should just rollup the values from previous aggregations.

    CREATE MATERIALIZED VIEW val_daily_agg
    WITH (timescaledb.continuous) AS
    SELECT
      time_bucket(INTERVAL '1 day', "ts_hour") AS ts_day,
      sensor_id, measurement_pt_id, tag_id,
       rollup(stats_agg) as stats_agg
       rollup(percentile_agg) as percentile_agg
    FROM val_hourly_agg
    GROUP BY ts_day, sensor_id, measurement_pt_id, tag_id
    ;
    

    Then just build other views that are not materialized to access it:

    create view {timeframe}_dashboard as
    select
      ts_day,
      sensor_id, 
      measurement_pt_id, 
      tag_id,
      percentile_agg->approx_percentile(0.1) as p_10,
      percentile_agg->approx_percentile(0.5) as median,
      percentile_agg->approx_percentile(0.9) as p_90,
      stats_agg->average(),
      stags_agg->stddev()
    FROM val_{timeframe}_agg;
    ....