I have a table period_of_hours in PostgreSQL. This table contains several column but for my case is important column timestamp (timestamp without time zone) and key(character varying). I have a scheduler which add a new row to the table with period of hour. I don't want to bother you why but sometimes it skip adding to the table a new row.
As you can see from this example period from 2024-05-06 10:00:00 to 2024-05-06 15:00:00
contains all lines consecutively,
then one line is skipped 2024-05-06 16:00:00
. From 2024-05-06 17:00:00 to 2024-05-06 21:00:00
is Ok and two rows is missed
2024-05-06 22:00:00
and 2024-05-06 23:00:00
than 2024-05-07 02:00:00
also missed.
Table "period_of_hours"
timestamp key
2024-05-06 10:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 11:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 12:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 13:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 14:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 15:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 17:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 18:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 19:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 20:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 21:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 00:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 01:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 03:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 04:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 05:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 10:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 11:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 12:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 13:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 14:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 15:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 17:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 18:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 19:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 20:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 21:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 00:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 01:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 03:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 04:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 05:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
First of all i want to select all rows by key (for example '009eae73-0b55-3809-9938-4bc0a342e451') and filter result set to get all missed rows ? In real table it is more than 1000 rows.
Expected result for key column '009eae73-0b55-3809-9938-4bc0a342e451':
2024-05-06 16:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 22:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 23:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 02:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
This solution is works for me.
with continuous("timestamp") as (
select
'2024-05-06 10:00:00' :: timestamp + n * '1 hour' :: interval
FROM
generate_series(
0,
(
select
extract(
epoch
from
max(timestamp)- min(
'2024-05-06 10:00:00' :: timestamp
)
)/ 3600
from
period_of_hours
where
"key" = '0a3e1588-ad59-3586-b071-d5001f5ff9a7'
):: integer,
1
) as n
)
select
c.timestamp
from
continuous c
left join period_of_hours p on c.timestamp = p.timestamp
and p.key = '0a3e1588-ad59-3586-b071-d5001f5ff9a7'
where
p.timestamp is null;
is it possible to combine key in construction and solution above because i can have a list of UUID ?
key in (
'a63ffce5-1d86-3afc-8b5a-97452f935632',
'0a3e1588-ad59-3586-b071-d5001f5ff9a7',
'e5524b9b-3aca-3b80-9aab-19bfec30fb9b')```
I dont understand whats is continuous("timestamp") ? It is a function or what ?
You can generate a continuous version of your data set and run an anti-join against it: demo
with continuous("timestamp") as (
select '2024-05-06 10:00:00'::timestamp+n*'1 hour'::interval
from generate_series( 0
,(select extract(epoch from max("timestamp")
-min("timestamp"))/3600
from period_of_hours
where "key"='009eae73-0b55-3809-9938-4bc0a342e451')
,1)as n)
select c."timestamp" from continuous c
left join period_of_hours p
on c."timestamp"=p."timestamp"
and p."key"='009eae73-0b55-3809-9938-4bc0a342e451'
where p."timestamp" is null;
timestamp |
---|
2024-05-06 16:00:00 |
2024-05-06 22:00:00 |
2024-05-06 23:00:00 |
2024-05-07 02:00:00 |
An except
, a not in
or <>all()
could achieve the same.
You could also treat this as gaps-and-islands problem and run a window function to spot the gaps and report their width, then generate a patch of rows using that:
select "timestamp"+n*'1h'::interval as "timestamp"
from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
/3600 as "diff"
from period_of_hours
where "key"='009eae73-0b55-3809-9938-4bc0a342e451'
window w1 as(order by "timestamp") )_
cross join lateral generate_series(1,"diff"-1,1) as n
where "diff">1;
If you want to list each key with all its gaps, it's enough to change the window definition a bit:
select "key","timestamp"+n*'1h'::interval as "timestamp"
from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
/3600 as "diff"
from period_of_hours
window w1 as(partition by "key" order by "timestamp") )_
cross join lateral generate_series(1,"diff"-1,1) as n
where "diff">1;
key | timestamp |
---|---|
009eae73-0b55-3809-9938-4bc0a342e451 | 2024-05-06 16:00:00 |
009eae73-0b55-3809-9938-4bc0a342e451 | 2024-05-06 22:00:00 |
009eae73-0b55-3809-9938-4bc0a342e451 | 2024-05-06 23:00:00 |
009eae73-0b55-3809-9938-4bc0a342e451 | 2024-05-07 02:00:00 |
fc85b307-15d6-369b-86b5-a4aa19871dff | 2024-05-06 16:00:00 |
fc85b307-15d6-369b-86b5-a4aa19871dff | 2024-05-06 22:00:00 |
fc85b307-15d6-369b-86b5-a4aa19871dff | 2024-05-06 23:00:00 |
fc85b307-15d6-369b-86b5-a4aa19871dff | 2024-05-07 02:00:00 |