sqlpostgresqldate-rangegenerate-series

PostgreSQL splitting time range into days


I'm trying to write a complex query using PostgreSQL 9.2.4, and I'm having trouble getting it working. I have a table which contains a time range, as well as several other columns. When I store data in this table, if all of the columns are the same and the time ranges overlap or are adjacent, I combine them into one row.

When I retrieve them, though, I want to split the ranges at day boundaries - so for example:

2013-01-01 00:00:00 to 2013-01-02 23:59:59

would be selected as two rows:

2013-01-01 00:00:00 to 2013-01-01 23:59:59
2013-01-02 00:00:00 to 2013-01-02 23:59:59

with the values in the other columns the same for both retrieved entries.

I have seen this question which seems to more or less address what I want, but it's for a "very old" version of PostgreSQL, so I'm not sure it's really still applicable.

I've also seen this question, which does exactly what I want, but as far as I know the CONNECT BY statement is an Oracle extension to the SQL standard, so I can't use it.

I believe I can achieve this using PostgreSQL's generate_series, but I'm hoping there's a simple example out there demonstrating how it can be used to do this.

This is the query I'm working on at the moment, which currently doesn't work (because I can't reference the FROM table in a joined subquery), but I believe this is more-or-less the right track.

Here's the fiddle with the schema, sample data, and my working query.

Update: I just found out a fun fact, thanks to this question, that if you use a set-returning function in the SELECT part of the query, PostgreSQL will "automagically" do a cross join on the set and the row. I think I'm close to getting this working.


Solution

  • First off, your handling of upper bounds is broken. A timestamp with 23:59:59 is no good. The data type timestamp allows fractional digits (currently µs resolution). What about 2013-10-18 23:59:59.123::timestamp?

    Include the lower bound and exclude the upper bound everywhere. See:

    Building on this premise:

    Since Postgres 9.3

    Use a LATERAL subquery.

    SELECT id
         , CASE WHEN sday = d THEN stime ELSE d                    END AS stime
         , CASE WHEN eday = d THEN etime ELSE d + interval '1 day' END AS etime
    FROM  (
       SELECT id, stime, etime
            , date_trunc('day', stime) AS sday
            , date_trunc('day', etime) AS eday
       FROM   timesheet_entries
       ) t
         , generate_series(sday, eday, interval '1 day') d
    WHERE  d < etime  -- filter noise row for etime at 00:00
    ORDER  BY id, stime;
    

    fiddle

    The subquery t adds sday and eday, which are stime and etime respectively, truncated to the day. For repeated use.

    It's best to call generate_series() with timestamp input. See:

    When etime falls on 00:00 exactly, a row with a zero time range would be added. The added filter WHERE d < etime discards the noise.

    In the above query,

     , generate_series(t.sday, t.eday, interval '1 day') d
    

    is short syntax for:

    CROSS JOIN LATERAL generate_series(t.sday, t.eday, interval '1 day') d
    

    See:

    Postgres 9.2 or older

    -- Postgres 9.2 (with corner case fix)
    SELECT id
         , CASE WHEN stime::date = d THEN stime ELSE d     END AS stime
         , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
    FROM (
       SELECT id, stime, etime
            , generate_series(stime, etime, interval '1 day')::date AS d
       FROM   timesheet_entries
       ) t
    WHERE  d < etime  -- filter noise row for etime at 00:00
    ORDER  BY id, stime;
    

    fiddle