I want to create some aggregations of table. But when I look at execution plan I find out that there is massive row increase in sort (from 5 121 to 1 453 235 569). Why this is happend? What can cause this increase?
More info: Original table has this schema:
CREATE TABLE event (
deviceid varchar NOT NULL,
"time" timestamptz NOT NULL,
signal_quality float8 NULL,
packet_loss float8 NULL,
ping float8 NULL,
jitter float8 NULL,
download float8 NULL,
upload float8 NULL,
latency float8 NULL,
profinet float8 NULL
);
Next table has data about start tests:
CREATE TABLE start_test (
deviceid varchar NOT NULL,
"time" timestamptz NOT NULL
);
I want to agregate data for each test. For one test there will be more rows, each row contains one type of aggregation - for example one row for minimum, one row for median,... Here is subset of aggregation script:
WITH event_cycle AS (
SELECT
e.*,
sc.time as start_time,
sc.end_time as end_time
FROM (
SELECT
*
FROM event
) e
LEFT JOIN (
SELECT *, COALESCE(LEAD(time) OVER(PARTITION BY deviceid ORDER BY time), NOW()) as end_time
FROM start_test
) sc ON sc.time <= e.time AND e.time < sc.end_time AND e.deviceid = sc.deviceid
), bounds AS (
SELECT
start_time start_time_b,
min(end_time) end_time_b,
deviceid deviceid_b,
--low
PERCENTILE_DISC(0.005) WITHIN GROUP(ORDER BY signal_quality) as signal_quality_low,
--high
PERCENTILE_DISC(0.995) WITHIN GROUP(ORDER BY signal_quality) as signal_quality_high
FROM event_cycle
GROUP BY start_time, deviceid
), data_with_boundaries AS (
SELECT *
FROM event_cycle ec
LEFT JOIN bounds b ON b.start_time_b <= ec.time AND ec.time < b.end_time_b AND ec.deviceid = b.deviceid_b
)
SELECT *
FROM (
SELECT
min(time) as start_time,
max(time) as end_time,
deviceid,
'min' as agg_type,
min(signal_quality) as signal_quality
FROM event_cycle
GROUP BY start_time, deviceid
) min
UNION ALL
SELECT *
FROM (
SELECT
min(time) as start_time,
max(time) as end_time,
deviceid,
'median' as agg_type,
PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY signal_quality) as signal_quality
FROM event_cycle
GROUP BY start_time, deviceid
) median
UNION ALL
SELECT *
FROM (
SELECT
min(time) as start_time,
max(time) as end_time,
deviceid,
'ipr_avg' as agg_type,
avg(CASE WHEN signal_quality
BETWEEN signal_quality_low
AND signal_quality_high
THEN signal_quality
ELSE NULL
END) as signal_quality
FROM data_with_boundaries
GROUP BY start_time, deviceid
) ipr_avg
This is whole query plan:
Append (cost=65349092.63..2218756400.84 rows=120000 width=88) (actual time=108568.862..305159.053 rows=15363 loops=1)
CTE event_cycle
-> Hash Left Join (cost=652.57..23608823.05 rows=161244967 width=99) (actual time=11.336..107377.365 rows=855037 loops=1)
Hash Cond: ((event.deviceid)::text = (sc.deviceid)::text)
Join Filter: ((sc."time" <= event."time") AND (event."time" < sc.end_time))
Rows Removed by Join Filter: 1451937372
-> Seq Scan on event (cost=0.00..17802.37 rows=855037 width=83) (actual time=0.095..338.366 rows=855037 loops=1)
-> Hash (cost=588.92..588.92 rows=5092 width=27) (actual time=11.190..11.193 rows=5118 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 359kB
-> Subquery Scan on sc (cost=423.43..588.92 rows=5092 width=27) (actual time=8.076..10.517 rows=5118 loops=1)
-> WindowAgg (cost=423.43..538.00 rows=5092 width=27) (actual time=8.075..10.152 rows=5118 loops=1)
-> Sort (cost=423.43..436.16 rows=5092 width=19) (actual time=8.060..8.260 rows=5118 loops=1)
Sort Key: start_test.deviceid, start_test."time"
Sort Method: quicksort Memory: 592kB
-> Seq Scan on start_test (cost=0.00..109.92 rows=5092 width=19) (actual time=0.017..0.557 rows=5118 loops=1)
-> Subquery Scan on min (cost=41740269.58..44159744.08 rows=40000 width=88) (actual time=108568.860..108914.711 rows=5121 loops=1)
-> GroupAggregate (cost=41740269.58..44159344.08 rows=40000 width=96) (actual time=108568.860..108914.282 rows=5121 loops=1)
Group Key: event_cycle.start_time, event_cycle.deviceid
-> Sort (cost=41740269.58..42143382.00 rows=161244967 width=56) (actual time=108568.845..108751.590 rows=855037 loops=1)
Sort Key: event_cycle.start_time, event_cycle.deviceid
Sort Method: external merge Disk: 38608kB
-> CTE Scan on event_cycle (cost=0.00..3224899.34 rows=161244967 width=56) (actual time=11.338..108096.087 rows=855037 loops=1)
-> Subquery Scan on median (cost=41740269.58..44159844.08 rows=40000 width=88) (actual time=812.829..1215.500 rows=5121 loops=1)
-> GroupAggregate (cost=41740269.58..44159444.08 rows=40000 width=96) (actual time=812.828..1214.812 rows=5121 loops=1)
Group Key: event_cycle_1.start_time, event_cycle_1.deviceid
-> Sort (cost=41740269.58..42143382.00 rows=161244967 width=56) (actual time=812.797..1009.458 rows=855037 loops=1)
Sort Key: event_cycle_1.start_time, event_cycle_1.deviceid
Sort Method: external merge Disk: 38608kB
-> CTE Scan on event_cycle event_cycle_1 (cost=0.00..3224899.34 rows=161244967 width=56) (actual time=0.058..229.673 rows=855037 loops=1)
-> Subquery Scan on ipr_avg (cost=2035162059.84..2106827389.62 rows=40000 width=88) (actual time=194608.278..195027.597 rows=5121 loops=1)
-> GroupAggregate (cost=2035162059.84..2106826989.62 rows=40000 width=96) (actual time=194608.276..195027.074 rows=5121 loops=1)
Group Key: ec.start_time, ec.deviceid
-> Sort (cost=2035162059.84..2044120113.56 rows=3583221489 width=72) (actual time=194608.252..194833.440 rows=855037 loops=1)
Sort Key: ec.start_time, ec.deviceid
Sort Method: external merge Disk: 45552kB
-> Merge Left Join (cost=85903271.20..731689364.04 rows=3583221489 width=72) (actual time=1673.883..194023.850 rows=855037 loops=1)
Merge Cond: ((ec.deviceid)::text = (b.deviceid_b)::text)
Join Filter: ((b.start_time_b <= ec."time") AND (ec."time" < b.end_time_b))
Rows Removed by Join Filter: 1452792384
-> Sort (cost=41740269.58..42143382.00 rows=161244967 width=56) (actual time=739.639..1020.075 rows=855037 loops=1)
Sort Key: ec.deviceid
Sort Method: external merge Disk: 38608kB
-> CTE Scan on event_cycle ec (cost=0.00..3224899.34 rows=161244967 width=56) (actual time=0.027..182.877 rows=855037 loops=1)
-> Sort (cost=44163001.63..44163101.63 rows=40000 width=64) (actual time=934.232..51961.852 rows=1453235569 loops=1)
Sort Key: b.deviceid_b
Sort Method: quicksort Memory: 593kB
-> Subquery Scan on b (cost=41740269.58..44159944.08 rows=40000 width=64) (actual time=638.853..932.024 rows=5121 loops=1)
-> GroupAggregate (cost=41740269.58..44159544.08 rows=40000 width=64) (actual time=638.852..931.538 rows=5121 loops=1)
Group Key: event_cycle_2.start_time, event_cycle_2.deviceid
-> Sort (cost=41740269.58..42143382.00 rows=161244967 width=56) (actual time=638.826..788.470 rows=855037 loops=1)
Sort Key: event_cycle_2.start_time, event_cycle_2.deviceid
Sort Method: external merge Disk: 35424kB
-> CTE Scan on event_cycle event_cycle_2 (cost=0.00..3224899.34 rows=161244967 width=56) (actual time=0.027..184.881 rows=855037 loops=1)
Planning Time: 0.638 ms
Execution Time: 305182.448 ms
Using GiST index for range of start and end time
First join (creating event_cycle):
Merge Left Join (cost=184294.16..571464.97 rows=855012 width=99) (actual time=477.090..252598.974 rows=855012 loops=1)
Merge Cond: ((network_checker_event.deviceid)::text = (sc.deviceid)::text)
Join Filter: (sc.start_end_time @> network_checker_event."time")
Rows Removed by Join Filter: 1451937254
-> Sort (cost=183874.68..186012.21 rows=855012 width=83) (actual time=475.940..854.321 rows=855012 loops=1)
Sort Key: network_checker_event.deviceid
Sort Method: external merge Disk: 58568kB
-> Seq Scan on network_checker_event (cost=0.00..17802.12 rows=855012 width=83) (actual time=0.022..45.523 rows=855012 loops=1)
-> Sort (cost=419.48..432.28 rows=5118 width=80) (actual time=0.986..51704.010 rows=1452380432 loops=1)
Sort Key: sc.deviceid
Sort Method: quicksort Memory: 912kB
-> Seq Scan on start_cycle_gist sc (cost=0.00..104.18 rows=5118 width=80) (actual time=0.017..0.347 rows=5118 loops=1)
Planning Time: 1.095 ms
Execution Time: 253883.973 ms
Nested Loop Left Join (cost=0.15..382777.33 rows=557590 width=216) (actual time=0.065..7138.223 rows=855012 loops=1)
-> Seq Scan on event_cycle ec (cost=0.00..15713.90 rows=557590 width=120) (actual time=0.036..184.663 rows=855012 loops=1)
-> Index Scan using bounds_idx on bounds b (cost=0.15..0.65 rows=1 width=96) (actual time=0.008..0.008 rows=1 loops=855012)
Index Cond: (start_end_time_b @> ec."time")
Filter: ((ec.deviceid)::text = (deviceid_b)::text)
Rows Removed by Filter: 1
Planning Time: 1.081 ms
Execution Time: 7166.532 ms
The SQL is:
DROP TABLE IF EXISTS start_cycle_gist;
EXPLAIN ANALYZE
CREATE TEMP TABLE start_cycle_gist AS
SELECT
*,
COALESCE(LEAD(time) OVER(PARTITION BY deviceid ORDER BY time), NOW()) as end_time,
tstzrange(time, COALESCE(LEAD(time) OVER(PARTITION BY deviceid ORDER BY time), NOW()), '[)') AS start_end_time
FROM start_cycle
;
CREATE INDEX start_cycle_gist_idx ON start_cycle_gist USING gist(start_end_time)
;
DROP TABLE IF EXISTS event_cycle;
EXPLAIN ANALYZE
CREATE TEMP TABLE event_cycle AS
SELECT
e.*,
sc.time as start_time,
sc.end_time as end_time
FROM (
SELECT
*
FROM event
) e
LEFT JOIN start_cycle_gist sc ON sc.start_end_time @> e.time AND e.deviceid = sc.deviceid
;
DROP TABLE IF EXISTS bounds;
EXPLAIN ANALYZE
CREATE TEMP TABLE bounds AS
SELECT
start_time start_time_b,
min(end_time) end_time_b,
tstzrange(start_time, min(end_time), '[)') AS start_end_time_b,
deviceid deviceid_b,
--low
PERCENTILE_DISC(0.005) WITHIN GROUP(ORDER BY signal_quality) as signal_quality_low,
--high
PERCENTILE_DISC(0.995) WITHIN GROUP(ORDER BY signal_quality) as signal_quality_high
FROM event_cycle
WHERE start_time IS NOT null
GROUP BY start_time, deviceid
;
CREATE INDEX bounds_idx ON bounds USING gist(start_end_time_b)
;
EXPLAIN ANALYZE
SELECT *
FROM event_cycle ec
LEFT JOIN bounds b ON b.start_end_time_b @> ec.time AND ec.deviceid = b.deviceid_b
;
On a merge join, every tie in the first child requires the 2nd child to be partially rescanned, and the number of actual rows reported by the plan includes each row each time it was scanned. The merge join actually produces a huge number of rows initially, but then the vast majority are thrown out based on
Join Filter: ((b.start_time_b <= ec."time") AND (ec."time" < b.end_time_b))
Rows Removed by Join Filter: 1452792384
You might want to incorporate start_time_b and end_time_b into a single range type, then build a composite index using GiST (deviceid_b, range_time_b)