postgresqlgaps-and-islandsgaps-in-data

Subtract Rows in Postgresql using WINDOW functions


I have following table called "Projects":

CREATE TABLE Projects (Task_ID integer, Start_Date date, End_Date date)

INSERT INTO Projects VALUES
(1, '2015-10-01', '2015-10-02'),
(2, '2015-10-02', '2015-10-03'),
(3, '2015-10-03', '2015-10-04'),
(4, '2015-10-13', '2015-10-14'),
(5, '2015-10-14', '2015-10-15'),
(6, '2015-10-28', '2015-10-29'),
(7, '2015-10-30', '2015-10-31')

It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table.If the End_Date of the tasks are consecutive, then they are part of the same project. I need to write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then it should be ordered by the start date of the project.


Solution

  • Here is the solution:

    WITH results AS 
    (
    SELECT p.*,
           extract('doy'
    FROM p.start_date) AS start_Doy,
         extract('doy'
    FROM p.end_date) AS end_Doy,
         ROW_NUMBER() OVER(PARTITION BY COUNT(*)
    ORDER BY p.end_date) AS row_num,
             extract('doy'
    FROM p.end_date) -(ROW_NUMBER() OVER(PARTITION BY COUNT(*)
    ORDER BY p.end_date)) AS DIFF
    FROM Projects p
    GROUP BY p.task_id,
             p.start_date,
             p.end_date
    ORDER BY p.end_date)
    SELECT MIN(r.start_date) AS "Project Start Date",
           MAX(r.end_date) AS "Project End Date",
           (MAX(r.end_doy) - MIN(r.start_doy)) AS "Project Duration"
    FROM results r
    Group BY r.diff
    ORDER BY "Project Duration",
             "Project Start Date"