sqlsnowflake-cloud-data-platformsubqueryuser-defined-functionscorrelated-subquery

Snowflake unsupported subquery type for UDTF


In Snowflake, I'm trying to save a large amount of code size and maintenance by creating a UDTF to show flags for each row v based on presence of rows in x or y meeting some pattern.

The error I get is SQL Error [2031] [42601]: SQL compilation error: Unsupported subquery type cannot be evaluated

Here's some minimal example code:

DROP TABLE IF EXISTS v;
CREATE TABLE v (CHARTGUID NUMBER(16));
INSERT INTO v VALUES(1410522400170);
INSERT INTO v VALUES(1548089000170);

DROP TABLE IF EXISTS x;
CREATE TABLE x (CHARTGUID NUMBER(16), FOO INT, BAR VARCHAR(1));
INSERT INTO x(CHARTGUID, FOO, BAR) VALUES (1410522400170, 1, 'a');
INSERT INTO x(CHARTGUID, FOO, BAR) VALUES (1548089000170, 0, 'a');

DROP TABLE IF EXISTS y;
CREATE TABLE y (CHARTGUID NUMBER(16), FOO INT, BAR VARCHAR(1));
INSERT INTO y(CHARTGUID, FOO, BAR) VALUES (1410522400170, 0, 'a');
INSERT INTO y(CHARTGUID, FOO, BAR) VALUES (1548089000170, 1, 'a');
INSERT INTO y(CHARTGUID, FOO, BAR) VALUES (1548089000170, 1, 'b');

CREATE OR REPLACE FUNCTION TEST_C(CHART_GUID NUMBER, p VARCHAR)
   RETURNS TABLE(A INT, B INT)
AS
$$
   SELECT
      CASE WHEN EXISTS (
         SELECT x.CHARTGUID
         FROM x
         WHERE x.CHARTGUID = CHART_GUID
            AND x.FOO = 1
            AND x.BAR = p
      ) THEN 1 ELSE 0 END,
      CASE WHEN EXISTS (
         SELECT y.CHARTGUID
         FROM y
         WHERE y.CHARTGUID = CHART_GUID
            AND y.FOO = 1
            AND y.BAR = p
      ) THEN 1 ELSE 0 END,
$$
;

SELECT
   v.CHARTGUID,
   x.*
FROM v,
   TABLE(TEST_C(v.CHARTGUID, 'a')) x

The true use case has many TABLE(TEST_C(v.CHARTGUID, 'a')), where 'a' is a regex pattern and its presence in the two tables x and y needs to be checked for each pattern.

I want to avoid this sort of pattern:

SELECT 
   CASE WHEN EXISTS (
      SELECT x.CHARTGUID
      FROM x
      WHERE x.CHARTGUID = v.CHARTGUID
         AND x.BAR LIKE 'a%'
   ) THEN 1 ELSE 0 END AS "Ax",
   CASE WHEN EXISTS (
      SELECT y.CHARTGUID
      FROM y
      WHERE y.CHARTGUID = v.CHARTGUID
         AND y.BAR LIKE 'a%'
   ) THEN 1 ELSE 0 END AS "Ay",
   CASE WHEN EXISTS (
      SELECT x.CHARTGUID
      FROM x
      WHERE x.CHARTGUID = v.CHARTGUID
         AND x.BAR LIKE 'b%'
   ) THEN 1 ELSE 0 END AS "Bx",
   CASE WHEN EXISTS (
      SELECT y.CHARTGUID
      FROM y
      WHERE y.CHARTGUID = v.CHARTGUID
         AND y.BAR LIKE 'b%'
   ) THEN 1 ELSE 0 END AS "By",
   CASE WHEN EXISTS (
      SELECT x.CHARTGUID
      FROM x
      WHERE x.CHARTGUID = v.CHARTGUID
         AND x.BAR LIKE 'c%'
   ) THEN 1 ELSE 0 END AS "Cx",
   CASE WHEN EXISTS (
      SELECT y.CHARTGUID
      FROM y
      WHERE y.CHARTGUID = v.CHARTGUID
         AND y.BAR LIKE 'c%'
   ) THEN 1 ELSE 0 END AS "Cy",
   -- etc...
FROM v

What's the workaround for snowflakes 'subquery limitation'?


Solution

  • So the tables can be written differently, that I feel is cleaner as there are less characters, thus it's more readable, and they are atomic operations:

    CREATE or replace TABLE v (CHARTGUID number) 
        as select * from values
            (1410522400170),
            (1548089000170);
    
    CREATE or replace TABLE x (CHARTGUID number, FOO number, BAR text)
        as select * from values
            (1410522400170, 1, 'a'),
            (1548089000170, 0, 'a');
    
    CREATE or replace TABLE y (CHARTGUID number, FOO number, BAR text)
        as select * from values
            (1410522400170, 0, 'a'),
            (1548089000170, 1, 'a'),
            (1548089000170, 1, 'b');
    

    I would be inclined to pre-process the data so it can be equi join base (left or normal) with CTE's

    WITH xx as (
        select distinct chartguid, bar
        from x 
        where foo = 1
    ), yy as (
        select distinct chartguid, bar
        from y
        where foo = 1
    )
    SELECT
       v.chartguid,
       NVL2(xx.chartguid, 1,0) as x1,
       NVL2(yy.chartguid, 1,0) as x2
    FROM v
    LEFT JOIN xx 
        on xx.chartguid = v.chartguid 
            and xx.bar = 'a'
    LEFT JOIN yy
        on yy.chartguid = v.chartguid
            and xx.bar = 'a'
    

    This does not give you "one clever function to hide all the workings" which is generally a gross trick, but if you need clever, you could make a UDTF that joins those two+ CTE's together before join the main table, but this might do way too much work.

    enter image description here

    With the extended case I would more likely write it like:

    WITH xx as (
        select 
            chartguid, 
            count_if(bar LIKE 'a%') as "Ax",
            count_if(bar LIKE 'b%') as "Bx",
            count_if(bar LIKE 'c%') as "Cx",
        from x 
        group by 1
    ), yy as (
        select 
            chartguid,
            count_if(bar LIKE 'a%') as "Ay",
            count_if(bar LIKE 'b%') as "By",
            count_if(bar LIKE 'c%') as "Cy",
        from y
        group by 1
    )
    SELECT
       v.chartguid,
       xx.* exclude(chartguid),
       yy.* exclude(chartguid)
    FROM v
    LEFT JOIN xx 
        on xx.chartguid = v.chartguid 
    LEFT JOIN yy
        on yy.chartguid = v.chartguid
    

    Depending on how your tables are shaped, it might be faster to pre-join to avoid agregating unneed rows from x and y, or it might be way slower.