Let's say I have an existing SQL table (which I am given, I have no control over it) describing a list of connections between cities, with a column flagging any gaps. Here's a simplified example:
PATH_ID | PATH_SEQ | FROM_CITY | TO_CITY | GAP |
---|---|---|---|---|
1 | 1 | Seattle | Portland | NULL |
1 | 2 | Portland | San Francisco | NULL |
1 | 3 | San Francisco | Los Angeles | NULL |
1 | 4 | Los Angeles | NULL | NULL |
2 | 1 | Dallas | Tulsa | NULL |
2 | 2 | Tulsa | Springfield | NULL |
2 | 3 | Springfield | Chicago | NULL |
2 | 4 | Chicago | Detroit | Y |
2 | 5 | Detroit | Pittsburgh | NULL |
2 | 6 | Pittsburgh | Philadelphia | NULL |
2 | 7 | Philadelphia | NULL | NULL |
... and so on. I can use GROUP_CONCAT to build a comma separated list of paths ordered by PATH_SEQ:
SELECT
PATH_ID,
GROUP_CONCAT(FROM_CITY ORDER BY PATH_SEQ) AS CITIES
FROM table
GROUP BY PATH_ID
...results in:
PATH_ID | CITIES |
---|---|
1 | Seattle,Portland,San Francisco,Los Angeles |
2 | Dallas,Tulsa,Springfield,Chicago,Detroit,Pittsburgh,Philadelphia |
Great! It feels like I'm almost there. But there should be no "connection" between Chicago and Detroit, indicated by that "GAP" column. What I really want is to somehow respect the GAP flag and use that to split into some kind of a sub-path. Ideal query result:
PATH_ID | SUBPATH_ID | CITIES |
---|---|---|
1 | 1 | Seattle,Portland,San Francisco,Los Angeles |
2 | 1 | Dallas,Tulsa,Springfield,Chicago |
2 | 2 | Detroit,Pittsburgh,Philadelphia |
The "SUBPATH_ID" could be anything. It doesn't need to be a number or even ordered. It doesn't even need to exist, for instance this would be fine, too:
PATH_ID | CITIES |
---|---|
1 | Seattle,Portland,San Francisco,Los Angeles |
2 | Dallas,Tulsa,Springfield,Chicago |
2 | Detroit,Pittsburgh,Philadelphia |
I just need a separate row to distinguish between the path starting at Dallas and the one starting at Detroit, while preserving the original PATH_ID. I feel like I have to maybe create a temporary table and generate these SUBPATH_IDs first to distinguish them, and then do the GROUP_CONCAT query using that, but I'm at the limit of my SQL knowledge unfortunately. Any tips or hints?
SQL engine is sqlite, if that makes a difference.
This is an example of Gaps-and-Islands. You can use a running COUNT()
on the GAP
column (counts the number of non-nulls) to get a grouping ID, then group by that column
SELECT
t.PATH_ID,
GROUP_CONCAT(t.FROM_CITY ORDER BY t.PATH_SEQ) AS CITIES
FROM (
SELECT t.*,
COUNT(t.GAP) OVER (PARTITION BY t.PATH_ID ORDER BY t.PATH_SEQ ROWS UNBOUNDED PRECEDING) AS GapCount
FROM YourTable AS t
) AS t
GROUP BY
t.PATH_ID,
t.GapCount;