sql snowflake-cloud-data-platform subquery user-defined-functions correlated-subquery

Snowflake unsupported subquery type for UDTF

In Snowflake, I'm trying to save a large amount of code size and maintenance by creating a UDTF to show flags for each row v based on presence of rows in x or y meeting some pattern.

The error I get is SQL Error [2031] [42601]: SQL compilation error: Unsupported subquery type cannot be evaluated

Here's some minimal example code:

DROP TABLE IF EXISTS v;
CREATE TABLE v (CHARTGUID NUMBER(16));
INSERT INTO v VALUES(1410522400170);
INSERT INTO v VALUES(1548089000170);

DROP TABLE IF EXISTS x;
CREATE TABLE x (CHARTGUID NUMBER(16), FOO INT, BAR VARCHAR(1));
INSERT INTO x(CHARTGUID, FOO, BAR) VALUES (1410522400170, 1, 'a');
INSERT INTO x(CHARTGUID, FOO, BAR) VALUES (1548089000170, 0, 'a');

DROP TABLE IF EXISTS y;
CREATE TABLE y (CHARTGUID NUMBER(16), FOO INT, BAR VARCHAR(1));
INSERT INTO y(CHARTGUID, FOO, BAR) VALUES (1410522400170, 0, 'a');
INSERT INTO y(CHARTGUID, FOO, BAR) VALUES (1548089000170, 1, 'a');
INSERT INTO y(CHARTGUID, FOO, BAR) VALUES (1548089000170, 1, 'b');

CREATE OR REPLACE FUNCTION TEST_C(CHART_GUID NUMBER, p VARCHAR)
   RETURNS TABLE(A INT, B INT)
AS
$$
   SELECT
      CASE WHEN EXISTS (
         SELECT x.CHARTGUID
         FROM x
         WHERE x.CHARTGUID = CHART_GUID
            AND x.FOO = 1
            AND x.BAR = p
      ) THEN 1 ELSE 0 END,
      CASE WHEN EXISTS (
         SELECT y.CHARTGUID
         FROM y
         WHERE y.CHARTGUID = CHART_GUID
            AND y.FOO = 1
            AND y.BAR = p
      ) THEN 1 ELSE 0 END,
$$
;

SELECT
   v.CHARTGUID,
   x.*
FROM v,
   TABLE(TEST_C(v.CHARTGUID, 'a')) x

The true use case has many TABLE(TEST_C(v.CHARTGUID, 'a')), where 'a' is a regex pattern and its presence in the two tables x and y needs to be checked for each pattern.

I want to avoid this sort of pattern:

SELECT 
   CASE WHEN EXISTS (
      SELECT x.CHARTGUID
      FROM x
      WHERE x.CHARTGUID = v.CHARTGUID
         AND x.BAR LIKE 'a%'
   ) THEN 1 ELSE 0 END AS "Ax",
   CASE WHEN EXISTS (
      SELECT y.CHARTGUID
      FROM y
      WHERE y.CHARTGUID = v.CHARTGUID
         AND y.BAR LIKE 'a%'
   ) THEN 1 ELSE 0 END AS "Ay",
   CASE WHEN EXISTS (
      SELECT x.CHARTGUID
      FROM x
      WHERE x.CHARTGUID = v.CHARTGUID
         AND x.BAR LIKE 'b%'
   ) THEN 1 ELSE 0 END AS "Bx",
   CASE WHEN EXISTS (
      SELECT y.CHARTGUID
      FROM y
      WHERE y.CHARTGUID = v.CHARTGUID
         AND y.BAR LIKE 'b%'
   ) THEN 1 ELSE 0 END AS "By",
   CASE WHEN EXISTS (
      SELECT x.CHARTGUID
      FROM x
      WHERE x.CHARTGUID = v.CHARTGUID
         AND x.BAR LIKE 'c%'
   ) THEN 1 ELSE 0 END AS "Cx",
   CASE WHEN EXISTS (
      SELECT y.CHARTGUID
      FROM y
      WHERE y.CHARTGUID = v.CHARTGUID
         AND y.BAR LIKE 'c%'
   ) THEN 1 ELSE 0 END AS "Cy",
   -- etc...
FROM v

What's the workaround for snowflakes 'subquery limitation'?

Solution

So the tables can be written differently, that I feel is cleaner as there are less characters, thus it's more readable, and they are atomic operations:

CREATE or replace TABLE v (CHARTGUID number) 
    as select * from values
        (1410522400170),
        (1548089000170);

CREATE or replace TABLE x (CHARTGUID number, FOO number, BAR text)
    as select * from values
        (1410522400170, 1, 'a'),
        (1548089000170, 0, 'a');

CREATE or replace TABLE y (CHARTGUID number, FOO number, BAR text)
    as select * from values
        (1410522400170, 0, 'a'),
        (1548089000170, 1, 'a'),
        (1548089000170, 1, 'b');

I would be inclined to pre-process the data so it can be equi join base (left or normal) with CTE's

WITH xx as (
    select distinct chartguid, bar
    from x 
    where foo = 1
), yy as (
    select distinct chartguid, bar
    from y
    where foo = 1
)
SELECT
   v.chartguid,
   NVL2(xx.chartguid, 1,0) as x1,
   NVL2(yy.chartguid, 1,0) as x2
FROM v
LEFT JOIN xx 
    on xx.chartguid = v.chartguid 
        and xx.bar = 'a'
LEFT JOIN yy
    on yy.chartguid = v.chartguid
        and xx.bar = 'a'

This does not give you "one clever function to hide all the workings" which is generally a gross trick, but if you need clever, you could make a UDTF that joins those two+ CTE's together before join the main table, but this might do way too much work.

With the extended case I would more likely write it like:

WITH xx as (
    select 
        chartguid, 
        count_if(bar LIKE 'a%') as "Ax",
        count_if(bar LIKE 'b%') as "Bx",
        count_if(bar LIKE 'c%') as "Cx",
    from x 
    group by 1
), yy as (
    select 
        chartguid,
        count_if(bar LIKE 'a%') as "Ay",
        count_if(bar LIKE 'b%') as "By",
        count_if(bar LIKE 'c%') as "Cy",
    from y
    group by 1
)
SELECT
   v.chartguid,
   xx.* exclude(chartguid),
   yy.* exclude(chartguid)
FROM v
LEFT JOIN xx 
    on xx.chartguid = v.chartguid 
LEFT JOIN yy
    on yy.chartguid = v.chartguid

Depending on how your tables are shaped, it might be faster to pre-join to avoid agregating unneed rows from x and y, or it might be way slower.