I'm trying to write a complex query using PostgreSQL 9.2.4, and I'm having trouble getting it working. I have a table which contains a time range, as well as several other columns. When I store data in this table, if all of the columns are the same and the time ranges overlap or are adjacent, I combine them into one row.
When I retrieve them, though, I want to split the ranges at day boundaries - so for example:
2013-01-01 00:00:00 to 2013-01-02 23:59:59
would be selected as two rows:
2013-01-01 00:00:00 to 2013-01-01 23:59:59
2013-01-02 00:00:00 to 2013-01-02 23:59:59
with the values in the other columns the same for both retrieved entries.
I have seen this question which seems to more or less address what I want, but it's for a "very old" version of PostgreSQL, so I'm not sure it's really still applicable.
I've also seen this question, which does exactly what I want, but as far as I know the CONNECT BY
statement is an Oracle extension to the SQL standard, so I can't use it.
I believe I can achieve this using PostgreSQL's generate_series
, but I'm hoping there's a simple example out there demonstrating how it can be used to do this.
This is the query I'm working on at the moment, which currently doesn't work (because I can't reference the FROM
table in a joined subquery), but I believe this is more-or-less the right track.
Here's the fiddle with the schema, sample data, and my working query.
Update: I just found out a fun fact, thanks to this question, that if you use a set-returning function in the SELECT
part of the query, PostgreSQL will "automagically" do a cross join on the set and the row. I think I'm close to getting this working.
First off, your handling of upper bounds is broken. A timestamp with 23:59:59
is no good. The data type timestamp
allows fractional digits (currently µs resolution). What about 2013-10-18 23:59:59.123::timestamp
?
Include the lower bound and exclude the upper bound everywhere. See:
Building on this premise:
Use a LATERAL
subquery.
SELECT id
, CASE WHEN sday = d THEN stime ELSE d END AS stime
, CASE WHEN eday = d THEN etime ELSE d + interval '1 day' END AS etime
FROM (
SELECT id, stime, etime
, date_trunc('day', stime) AS sday
, date_trunc('day', etime) AS eday
FROM timesheet_entries
) t
, generate_series(sday, eday, interval '1 day') d
WHERE d < etime -- filter noise row for etime at 00:00
ORDER BY id, stime;
The subquery t
adds sday
and eday
, which are stime
and etime
respectively, truncated to the day. For repeated use.
It's best to call generate_series()
with timestamp
input. See:
When etime
falls on 00:00
exactly, a row with a zero time range would be added. The added filter WHERE d < etime
discards the noise.
In the above query,
, generate_series(t.sday, t.eday, interval '1 day') d
is short syntax for:
CROSS JOIN LATERAL generate_series(t.sday, t.eday, interval '1 day') d
See:
-- Postgres 9.2 (with corner case fix)
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
FROM (
SELECT id, stime, etime
, generate_series(stime, etime, interval '1 day')::date AS d
FROM timesheet_entries
) t
WHERE d < etime -- filter noise row for etime at 00:00
ORDER BY id, stime;