sqlsnowflake-cloud-data-platformaggregate-functionswindow-functions

SQL filter rows by range overlap


I have a snowflake table that roughly resembles the below. I want to filter out all rows that have the same name, provider and an overlapping voltage range.

Before:

NAME Provider Min_Volt Max_Volt
OSMO Bell 10 100
OSMO Bell 80 150
OSMO Bell 120 155
OSMO Bell 160 200
MASD Tele 80 150

After:

NAME Provider Min_Volt Max_Volt
OSMO Bell 160 200
MASD Tele 80 150

Doing only the name and voltage is straightforward with a group by and count() > 1. But I don't know how to apply compare the rows to each other individually like that.

I was thinking of doing it with min/max window functions like

with min_max AS (
SELECT NAME
     , PROVIDER
     , MIN_VOLT
     , MAX_VOLT
     , MAX(max_volt) OVER (PARTITION BY PROVIDER, NAME) AS set_max
     , MIN(min_volt) OVER (PARTITION BY PROVIDER, NAME) AS set_min
FROM Table
)
SELECT NAME, PROVIDER, MIN_VOLT, MAX_VOLT
     FROM min_max
        WHERE MIN_VOLT NOT BETWEEN set_min AND set_min AND MAX_VOLT NOT BETWEEN set_min AND set_max;

However this returns no rows since its including the row its comparing in the max/min range. I'm not sure how to have it compare each row independently to each other row.

EDIT: For clarity - the rows we want to remove are those that have an overlapping voltage range with another row that has the same NAME and PROVIDER value. For example rows 1, 2 and 3 in the before table above all have the same name and provider, and there is overlap in the voltage range (10-100, 80-150, 120-155) so they should all be removed. Whereas Row 4 has the same name and provider but its voltage range doesn't overlap with the other rows so it should be kept.


Solution

  • Looks like LEAD and LAG should do the trick

    with min_max AS (
        SELECT
           NAME
         , PROVIDER
         , MIN_VOLT
         , MAX_VOLT
         , LAG(MAX_VOLT) OVER (PARTITION BY PROVIDER, NAME ORDER BY MIN_VOLT, MAX_VOLT) AS prev_max
         , LEAD(MIN_VOLT) OVER (PARTITION BY PROVIDER, NAME ORDER BY MIN_VOLT, MAX_VOLT) AS next_min
        FROM ORENNIA_ANALYTICS_QC.ORENNIA.PRE_PROCESSED_SUBSTATIONS
    )
    SELECT
        NAME,
        PROVIDER,
        MIN_VOLT,
        MAX_VOLT
    FROM min_max
    WHERE (prev_max IS NULL OR MIN_VOLT > prev_max)
      AND (next_min IS NULL OR MAX_VOLT < next_min);
    

    db<>fiddle