sqlpostgresqlgreatest-n-per-groupwindow-functionsgaps-and-islands

How to group following rows by not unique value


I have data like this:

table1
_____________
id way time
1  1   00:01
2  1   00:02
3  2   00:03
4  2   00:04
5  2   00:05
6  3   00:06
7  3   00:07
8  1   00:08
9  1   00:09

I would like to know in which time interval I was on which way:

desired output
_________________
id  way from   to    
1   1   00:01  00:02
3   2   00:03  00:05
6   3   00:06  00:07
8   1   00:08  00:09

I tried to use a window function:

SELECT DISTINCT
  first_value(id) OVER w AS id, 
  first_value(way) OVER w as way,
  first_value(time) OVER w as from,
  last_value(time) OVER w as to
FROM table1
WINDOW w AS (
  PARTITION BY way ORDER BY ID
  range between unbounded preceding and unbounded following);

What I get is:

ID  way from   to    
 1   1  00:01  00:09
 3   2  00:03  00:05
 6   3  00:06  00:07

And this is not correct, because on way 1 I wasn't from 00:01 to 00:09. Is there a possibility to do the partition according to the order, means grouping only following attributes, that are equal?


Solution

  • If your case is as simple as your example suggests, @Giorgos' answer serves nicely.

    However, that's typically not the case. With a serial id column you cannot assume that a row with an earlier time also has a smaller id.
    Also, time values (timestamp like you probably have) can easily be duplicates, you need to make the sort order unambiguous.

    Assuming both can happen, and you want the id from the row with the earliest time per time slice (actually, the smallest id for the earliest time, there could be ties), this query would deal with the situation properly:

    SELECT *
    FROM  (
       SELECT DISTINCT ON (way, grp)
              id, way, time AS time_from
            , max(time) OVER (PARTITION BY way, grp) AS time_to
       FROM (
          SELECT *
               , row_number() OVER (ORDER BY time, id)  -- id as tie breaker
               - row_number() OVER (PARTITION BY way ORDER BY time, id) AS grp
          FROM   table1
          ) t
       ORDER  BY way, grp, time, id
       ) sub
    ORDER  BY time_from, id;
    

    sqlfiddle (currently offline)

    If you are looking to optimize performance, a PL/pgSQL function could be faster in such a case. See:

    Aside: don't use the basic type name time as identifier (also a reserved word in standard SQL).