sqlstored-proceduressnowflake-cloud-data-platformuser-defined-functions

Call large SQL queries from inside procedures or functions to create common helper functions in Snowflake


I have a system where I need to identify different types of risk associated with entities. I have many large scripts for detecting different risk behaviours associated with entities, with some scripts producing one or more "risk marker types".

I have a master event table of all active and past risks. A dummy version is created by the following:

CREATE OR REPLACE SEQUENCE test_marker_event_id_seq
   START = 1
   INCREMENT = 1
   ORDER;

CREATE OR REPLACE TABLE test_entity_risk_markers
(
    mark_event_id NUMBER(10,0) DEFAULT test_marker_event_id_seq.nextval,
    entity_id NUMBER(10,0),
    risk_marker_type_id NUMBER(10,0),
    mark_event_eff_date TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP,
    mark_event_end_date TIMESTAMP_NTZ DEFAULT TO_DATE('9999-12-31')
);

INSERT INTO test_entity_risk_markers (
    entity_id,
    risk_marker_type_id,
    mark_event_eff_date,
    mark_event_end_date
) VALUES 
    (301, 501, TO_DATE('1994-01-08'), TO_DATE('9999-12-31')),
    (302, 501, TO_DATE('1994-04-01'), TO_DATE('2003-04-15')),
    (303, 501, TO_DATE('1994-01-08'), TO_DATE('9999-12-31')),
    (301, 502, TO_DATE('1994-04-01'), TO_DATE('9999-12-31')),
    (302, 502, TO_DATE('1994-01-08'), TO_DATE('2003-04-15')),
    (303, 502, TO_DATE('1994-04-01'), TO_DATE('2003-04-15')),
    (301, 503, TO_DATE('1994-01-08'), TO_DATE('9999-12-31')),
    (302, 503, TO_DATE('1994-04-01'), TO_DATE('2003-04-15')),
    (303, 503, TO_DATE('1994-01-08'), TO_DATE('9999-12-31')),
    (310, 510, TO_DATE('1994-04-01'), TO_DATE('9999-12-31'));
mark_event_id entity_id risk_marker_type_id mark_event_eff_date mark_event_end_date
1 301 501 1994-01-08 00:00:00.000 9999-12-31 00:00:00.000
2 302 501 1994-04-01 00:00:00.000 2003-04-15 00:00:00.000
3 303 501 1994-01-08 00:00:00.000 9999-12-31 00:00:00.000
4 301 502 1994-04-01 00:00:00.000 9999-12-31 00:00:00.000
5 302 502 1994-01-08 00:00:00.000 2003-04-15 00:00:00.000
6 303 502 1994-04-01 00:00:00.000 2003-04-15 00:00:00.000
7 301 503 1994-01-08 00:00:00.000 9999-12-31 00:00:00.000
8 302 503 1994-04-01 00:00:00.000 2003-04-15 00:00:00.000
9 303 503 1994-01-08 00:00:00.000 9999-12-31 00:00:00.000
10 310 510 1994-04-01 00:00:00.000 9999-12-31 00:00:00.000

Note in particular ID 1, 3 and 4 which are all active, and ids 2 and 5 which are inactive (as it has a real end date).

Regardless of the logic in the risk identifier scripts, the output has the same schema, and a dummy output can be produced for a script that identifies marker types 501 and 502 here:

CREATE OR REPLACE TABLE test_marker_output(
    entity_id NUMBER(10,0),
    risk_marker_type_id NUMBER(10,0)
);

INSERT INTO test_marker_output(
    entity_id,
    risk_marker_type_id
) VALUES
    (301, 501),
    (302, 501),
    (305, 501),
    (302, 502);
entity_id risk_marker_type_id
301 501
302 501
305 501
302 502

This simulates identifying a risk for entity 301, 302, and 305 for type 501, and 302 for type 502.

The following SQL queries update the master table with the relevant info from the output table. In particular the 301-501 pair is already known about and open so no changes are needed (event id 1), the 305-501 event is a new risk and needs to be added (new event id), the 302-501 and 302-502 events were previously closed and need to be reopened (old events 2 and 5, need new events), and 303-501 and 301-502 are no longer detected and need to be closed (event id 3 and 4):

UPDATE test_entity_risk_markers cbm
    SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
    FROM (
        WITH
        mo_types AS (
            SELECT
                DISTINCT risk_marker_type_id
            FROM test_marker_output
        ),
        to_close AS (
            SELECT
                cbm.*
            FROM test_entity_risk_markers cbm
            INNER JOIN mo_types
            ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
            LEFT OUTER JOIN test_marker_output mo
            ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
            WHERE mo.entity_id IS NULL
            AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
        )
        SELECT * FROM to_close
    ) tc
    WHERE cbm.mark_event_id = tc.mark_event_id;

INSERT INTO test_entity_risk_markers(
    entity_id,
    risk_marker_type_id
) SELECT 
    mo.*
FROM test_marker_output mo
LEFT JOIN test_entity_risk_markers cbm
ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
OR cbm.mark_event_end_date IS NULL;

SELECT * FROM test_entity_risk_markers;
mark_event_id entity_id risk_marker_type_id mark_event_eff_date mark_event_end_date
1 301 501 1994-01-08 00:00:00.000 9999-12-31 00:00:00.000
2 302 501 1994-04-01 00:00:00.000 2003-04-15 00:00:00.000
3 303 501 1994-01-08 00:00:00.000 2025-05-15 14:10:26.873
4 301 502 1994-04-01 00:00:00.000 2025-05-15 14:10:26.873
5 302 502 1994-01-08 00:00:00.000 2003-04-15 00:00:00.000
6 303 502 1994-04-01 00:00:00.000 2003-04-15 00:00:00.000
7 301 503 1994-01-08 00:00:00.000 9999-12-31 00:00:00.000
8 302 503 1994-04-01 00:00:00.000 2003-04-15 00:00:00.000
9 303 503 1994-01-08 00:00:00.000 9999-12-31 00:00:00.000
10 310 510 1994-04-01 00:00:00.000 9999-12-31 00:00:00.000
11 302 501 2025-05-15 14:10:28.015 9999-12-31 00:00:00.000
12 302 502 2025-05-15 14:10:28.015 9999-12-31 00:00:00.000
13 305 501 2025-05-15 14:10:28.015 9999-12-31 00:00:00.000

I do not want to copy-paste these 20 lines of code into the bottom of every risk identifier script, especially because others are creating some of these identifier scripts.

I would like to abstract this code into a procedure or function that could be called and passed a temporary table reference our a query output. Therefore a template risk ID script would be:

CREATE OR REPLACE TEMP TABLE temp_output(
    entity_id NUMBER(10,0),
    risk_marker_type_id NUMBER(10,0)
);
INSERT INTO temp_output
SELECT *
-- normally some logic to identify risks
FROM test_marker_output;
CALL update_master_entity_risk_markers('temp_output');

This would call the procedure which contains the helper queries. But the queries are too large for a procedure and I dont know where to begin to insert such a large query with variables into a function. AS an example, here is the error for the procedure.

CREATE OR REPLACE PROCEDURE update_master_entity_risk_markers(tabref VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
    res RESULTSET;
BEGIN
  LET stmt VARCHAR := """
    UPDATE test_entity_risk_markers cbm
        SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
        FROM (
            WITH
            mo_types AS (
                SELECT
                    DISTINCT risk_marker_type_id
                FROM ?
            ),
            to_close AS (
                SELECT
                    cbm.*
                FROM test_entity_risk_markers cbm
                INNER JOIN mo_types
                ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
                LEFT OUTER JOIN ? mo
                ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
                WHERE mo.entity_id IS NULL
                AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
            )
            SELECT * FROM to_close
        ) tc
        WHERE cbm.mark_event_id = tc.mark_event_id;

    INSERT INTO test_entity_risk_markers(
        entity_id,
        risk_marker_type_id
    ) SELECT 
        mo.*
    FROM ? mo
    LEFT JOIN test_entity_risk_markers cbm
    ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
    WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
    OR cbm.mark_event_end_date IS NULL;
    """;
  res := (EXECUTE IMMEDIATE stmt USING (tabref, tabref, tabref));
  RETURN 'Table updated.';
END;
$$;
SQL compilation error:
Object name '"
    UPDATE test_entity_risk_markers cbm
        SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
        FROM (
            WITH
            mo_types AS (
                SELECT
                    DISTINCT risk_marker_type_id
                FROM ?
            ),
            to_close AS (
                SELECT
                    cbm.*
                FROM test_entity_risk_markers cbm
                INNER JOIN mo_types
                ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
                LEFT OUTER JOIN ? mo
                ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
                WHERE mo.entity_id IS NULL
                AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
            )
            SELECT * FROM to_close
        ) tc
        WHERE cbm.mark_event_id = tc.mark_event_id;

    INSERT INTO test_entity_risk_markers(
        entity_id,
        risk_marker_type_id
    ) SELECT 
        mo.*
    FROM ? mo
    LEFT JOIN test_entity_risk_markers cbm
    ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
    WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
    OR cbm.mark_event_end_date IS NULL;
    "' exceeds maximum length limit of 255 characters.

How would I construct such a helper procedure or function?


Solution

  • First of all Snowflake uses " to indicate identifiers. Error message:

    SQL compilation error: Object name exceeds maximum length limit of 255 characters.

    LET stmt VARCHAR := """
       UPDATE...
    
    """;
    

    should rather be:

    LET stmt VARCHAR := '
       UPDATE...
    
    ';
    

    Second FROM ? mo is incorrect. If the idea is to parametrize the query with table name then it should use IDENTIFIER:

    FROM ? mo
    =>
    FROM IDENTIFIER(?) AS mo
    

    Third, EXECUTE IMMEDIATE when provided multiple statements will error out:

    Multiple SQL statements in a single API call are not supported; use one API call per statement instead.


    After applying above comments:

    CREATE OR REPLACE PROCEDURE update_master_entity_risk_markers(tabref VARCHAR)
    RETURNS VARCHAR
    LANGUAGE SQL
    AS
    DECLARE
        res RESULTSET;
        res2 RESULTSET;
    BEGIN
      LET stmt VARCHAR := $$
        UPDATE test_entity_risk_markers cbm
            SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
            FROM (
                WITH
                mo_types AS (
                    SELECT
                        DISTINCT risk_marker_type_id
                    FROM IDENTIFIER(?)
                ),
                to_close AS (
                    SELECT
                        cbm.*
                    FROM test_entity_risk_markers cbm
                    INNER JOIN mo_types
                    ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
                    LEFT OUTER JOIN IDENTIFIER(?) mo
                    ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
                    WHERE mo.entity_id IS NULL
                    AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
                )
                SELECT * FROM to_close
            ) tc
            WHERE cbm.mark_event_id = tc.mark_event_id;
        $$;
    
        LET stmt2 VARCHAR := $$
        INSERT INTO test_entity_risk_markers(
            entity_id,
            risk_marker_type_id
        ) SELECT 
            mo.*
        FROM IDENTIFIER(?) mo
        LEFT JOIN test_entity_risk_markers cbm
        ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
        WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
        OR cbm.mark_event_end_date IS NULL;
        $$;
      res := (EXECUTE IMMEDIATE stmt USING (tabref, tabref));
      res2 := (EXECUTE IMMEDIATE stmt2 USING (tabref));
      RETURN 'Table updated.';
    END;
    

    SP invocation:

    CALL update_master_entity_risk_markers(tabref => 'test_marker_output');
    --Table updated.
    

    Output: enter image description here


    ADDENDUM:

    In this example Dynamic SQL(EXECUTE IMMEDIATE) is not required as IDENTIFIER allows to parametrize UPDATE/INSERT directly:

    CREATE OR REPLACE PROCEDURE update_master_entity_risk_markers(tabref VARCHAR)
    RETURNS VARCHAR
    LANGUAGE SQL
    AS
    BEGIN
        UPDATE test_entity_risk_markers cbm
            SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
            FROM (
                WITH
                mo_types AS (
                    SELECT DISTINCT risk_marker_type_id
                    FROM IDENTIFIER(:tabref)
                ),
                to_close AS (
                    SELECT
                        cbm.*
                    FROM test_entity_risk_markers cbm
                    INNER JOIN mo_types
                    ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
                    LEFT OUTER JOIN IDENTIFIER(:tabref) mo
                      ON mo.entity_id = cbm.entity_id 
                     AND mo.risk_marker_type_id = cbm.risk_marker_type_id
                    WHERE mo.entity_id IS NULL
                    AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
                )
                SELECT * FROM to_close
            ) tc
            WHERE cbm.mark_event_id = tc.mark_event_id;
    
        INSERT INTO test_entity_risk_markers(
            entity_id,
            risk_marker_type_id
        ) SELECT 
            mo.*
        FROM IDENTIFIER(:tabref) mo
        LEFT JOIN test_entity_risk_markers cbm
          ON mo.entity_id = cbm.entity_id 
         AND mo.risk_marker_type_id = cbm.risk_marker_type_id
        WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
        OR cbm.mark_event_end_date IS NULL;
      RETURN 'Table updated.';
    END;
    
    
    CALL  update_master_entity_risk_markers(tabref => 'test_marker_output');
    --Table updated.
    

    Output:

    enter image description here