sqlsqlitegaps-and-islandsgroup-concat

How to GROUP_CONCAT row values, but break groups by an arbitrary "gap" flag?


Let's say I have an existing SQL table (which I am given, I have no control over it) describing a list of connections between cities, with a column flagging any gaps. Here's a simplified example:

PATH_ID PATH_SEQ FROM_CITY TO_CITY GAP
1 1 Seattle Portland NULL
1 2 Portland San Francisco NULL
1 3 San Francisco Los Angeles NULL
1 4 Los Angeles NULL NULL
2 1 Dallas Tulsa NULL
2 2 Tulsa Springfield NULL
2 3 Springfield Chicago NULL
2 4 Chicago Detroit Y
2 5 Detroit Pittsburgh NULL
2 6 Pittsburgh Philadelphia NULL
2 7 Philadelphia NULL NULL

... and so on. I can use GROUP_CONCAT to build a comma separated list of paths ordered by PATH_SEQ:

SELECT
    PATH_ID,
    GROUP_CONCAT(FROM_CITY ORDER BY PATH_SEQ) AS CITIES
FROM table
GROUP BY PATH_ID

...results in:

PATH_ID CITIES
1 Seattle,Portland,San Francisco,Los Angeles
2 Dallas,Tulsa,Springfield,Chicago,Detroit,Pittsburgh,Philadelphia

Great! It feels like I'm almost there. But there should be no "connection" between Chicago and Detroit, indicated by that "GAP" column. What I really want is to somehow respect the GAP flag and use that to split into some kind of a sub-path. Ideal query result:

PATH_ID SUBPATH_ID CITIES
1 1 Seattle,Portland,San Francisco,Los Angeles
2 1 Dallas,Tulsa,Springfield,Chicago
2 2 Detroit,Pittsburgh,Philadelphia

The "SUBPATH_ID" could be anything. It doesn't need to be a number or even ordered. It doesn't even need to exist, for instance this would be fine, too:

PATH_ID CITIES
1 Seattle,Portland,San Francisco,Los Angeles
2 Dallas,Tulsa,Springfield,Chicago
2 Detroit,Pittsburgh,Philadelphia

I just need a separate row to distinguish between the path starting at Dallas and the one starting at Detroit, while preserving the original PATH_ID. I feel like I have to maybe create a temporary table and generate these SUBPATH_IDs first to distinguish them, and then do the GROUP_CONCAT query using that, but I'm at the limit of my SQL knowledge unfortunately. Any tips or hints?

SQL engine is sqlite, if that makes a difference.


Solution

  • This is an example of Gaps-and-Islands. You can use a running COUNT() on the GAP column (counts the number of non-nulls) to get a grouping ID, then group by that column

    SELECT
        t.PATH_ID,
        GROUP_CONCAT(t.FROM_CITY ORDER BY t.PATH_SEQ) AS CITIES
    FROM (
        SELECT t.*,
          COUNT(t.GAP) OVER (PARTITION BY t.PATH_ID ORDER BY t.PATH_SEQ ROWS UNBOUNDED PRECEDING) AS GapCount
        FROM YourTable AS t
    ) AS t
    GROUP BY
      t.PATH_ID,
      t.GapCount;
    

    db-fiddle.com