I have a system where I need to identify different types of risk associated with entities. I have many large scripts for detecting different risk behaviours associated with entities, with some scripts producing one or more "risk marker types".
I have a master event table of all active and past risks. A dummy version is created by the following:
CREATE OR REPLACE SEQUENCE test_marker_event_id_seq
START = 1
INCREMENT = 1
ORDER;
CREATE OR REPLACE TABLE test_entity_risk_markers
(
mark_event_id NUMBER(10,0) DEFAULT test_marker_event_id_seq.nextval,
entity_id NUMBER(10,0),
risk_marker_type_id NUMBER(10,0),
mark_event_eff_date TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP,
mark_event_end_date TIMESTAMP_NTZ DEFAULT TO_DATE('9999-12-31')
);
INSERT INTO test_entity_risk_markers (
entity_id,
risk_marker_type_id,
mark_event_eff_date,
mark_event_end_date
) VALUES
(301, 501, TO_DATE('1994-01-08'), TO_DATE('9999-12-31')),
(302, 501, TO_DATE('1994-04-01'), TO_DATE('2003-04-15')),
(303, 501, TO_DATE('1994-01-08'), TO_DATE('9999-12-31')),
(301, 502, TO_DATE('1994-04-01'), TO_DATE('9999-12-31')),
(302, 502, TO_DATE('1994-01-08'), TO_DATE('2003-04-15')),
(303, 502, TO_DATE('1994-04-01'), TO_DATE('2003-04-15')),
(301, 503, TO_DATE('1994-01-08'), TO_DATE('9999-12-31')),
(302, 503, TO_DATE('1994-04-01'), TO_DATE('2003-04-15')),
(303, 503, TO_DATE('1994-01-08'), TO_DATE('9999-12-31')),
(310, 510, TO_DATE('1994-04-01'), TO_DATE('9999-12-31'));
mark_event_id | entity_id | risk_marker_type_id | mark_event_eff_date | mark_event_end_date |
---|---|---|---|---|
1 | 301 | 501 | 1994-01-08 00:00:00.000 | 9999-12-31 00:00:00.000 |
2 | 302 | 501 | 1994-04-01 00:00:00.000 | 2003-04-15 00:00:00.000 |
3 | 303 | 501 | 1994-01-08 00:00:00.000 | 9999-12-31 00:00:00.000 |
4 | 301 | 502 | 1994-04-01 00:00:00.000 | 9999-12-31 00:00:00.000 |
5 | 302 | 502 | 1994-01-08 00:00:00.000 | 2003-04-15 00:00:00.000 |
6 | 303 | 502 | 1994-04-01 00:00:00.000 | 2003-04-15 00:00:00.000 |
7 | 301 | 503 | 1994-01-08 00:00:00.000 | 9999-12-31 00:00:00.000 |
8 | 302 | 503 | 1994-04-01 00:00:00.000 | 2003-04-15 00:00:00.000 |
9 | 303 | 503 | 1994-01-08 00:00:00.000 | 9999-12-31 00:00:00.000 |
10 | 310 | 510 | 1994-04-01 00:00:00.000 | 9999-12-31 00:00:00.000 |
Note in particular ID 1, 3 and 4 which are all active, and ids 2 and 5 which are inactive (as it has a real end date).
Regardless of the logic in the risk identifier scripts, the output has the same schema, and a dummy output can be produced for a script that identifies marker types 501 and 502 here:
CREATE OR REPLACE TABLE test_marker_output(
entity_id NUMBER(10,0),
risk_marker_type_id NUMBER(10,0)
);
INSERT INTO test_marker_output(
entity_id,
risk_marker_type_id
) VALUES
(301, 501),
(302, 501),
(305, 501),
(302, 502);
entity_id | risk_marker_type_id |
---|---|
301 | 501 |
302 | 501 |
305 | 501 |
302 | 502 |
This simulates identifying a risk for entity 301, 302, and 305 for type 501, and 302 for type 502.
The following SQL queries update the master table with the relevant info from the output table. In particular the 301-501 pair is already known about and open so no changes are needed (event id 1), the 305-501 event is a new risk and needs to be added (new event id), the 302-501 and 302-502 events were previously closed and need to be reopened (old events 2 and 5, need new events), and 303-501 and 301-502 are no longer detected and need to be closed (event id 3 and 4):
UPDATE test_entity_risk_markers cbm
SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
FROM (
WITH
mo_types AS (
SELECT
DISTINCT risk_marker_type_id
FROM test_marker_output
),
to_close AS (
SELECT
cbm.*
FROM test_entity_risk_markers cbm
INNER JOIN mo_types
ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
LEFT OUTER JOIN test_marker_output mo
ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE mo.entity_id IS NULL
AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
)
SELECT * FROM to_close
) tc
WHERE cbm.mark_event_id = tc.mark_event_id;
INSERT INTO test_entity_risk_markers(
entity_id,
risk_marker_type_id
) SELECT
mo.*
FROM test_marker_output mo
LEFT JOIN test_entity_risk_markers cbm
ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
OR cbm.mark_event_end_date IS NULL;
SELECT * FROM test_entity_risk_markers;
mark_event_id | entity_id | risk_marker_type_id | mark_event_eff_date | mark_event_end_date |
---|---|---|---|---|
1 | 301 | 501 | 1994-01-08 00:00:00.000 | 9999-12-31 00:00:00.000 |
2 | 302 | 501 | 1994-04-01 00:00:00.000 | 2003-04-15 00:00:00.000 |
3 | 303 | 501 | 1994-01-08 00:00:00.000 | 2025-05-15 14:10:26.873 |
4 | 301 | 502 | 1994-04-01 00:00:00.000 | 2025-05-15 14:10:26.873 |
5 | 302 | 502 | 1994-01-08 00:00:00.000 | 2003-04-15 00:00:00.000 |
6 | 303 | 502 | 1994-04-01 00:00:00.000 | 2003-04-15 00:00:00.000 |
7 | 301 | 503 | 1994-01-08 00:00:00.000 | 9999-12-31 00:00:00.000 |
8 | 302 | 503 | 1994-04-01 00:00:00.000 | 2003-04-15 00:00:00.000 |
9 | 303 | 503 | 1994-01-08 00:00:00.000 | 9999-12-31 00:00:00.000 |
10 | 310 | 510 | 1994-04-01 00:00:00.000 | 9999-12-31 00:00:00.000 |
11 | 302 | 501 | 2025-05-15 14:10:28.015 | 9999-12-31 00:00:00.000 |
12 | 302 | 502 | 2025-05-15 14:10:28.015 | 9999-12-31 00:00:00.000 |
13 | 305 | 501 | 2025-05-15 14:10:28.015 | 9999-12-31 00:00:00.000 |
I do not want to copy-paste these 20 lines of code into the bottom of every risk identifier script, especially because others are creating some of these identifier scripts.
I would like to abstract this code into a procedure or function that could be called and passed a temporary table reference our a query output. Therefore a template risk ID script would be:
CREATE OR REPLACE TEMP TABLE temp_output(
entity_id NUMBER(10,0),
risk_marker_type_id NUMBER(10,0)
);
INSERT INTO temp_output
SELECT *
-- normally some logic to identify risks
FROM test_marker_output;
CALL update_master_entity_risk_markers('temp_output');
This would call the procedure which contains the helper queries. But the queries are too large for a procedure and I dont know where to begin to insert such a large query with variables into a function. AS an example, here is the error for the procedure.
CREATE OR REPLACE PROCEDURE update_master_entity_risk_markers(tabref VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
res RESULTSET;
BEGIN
LET stmt VARCHAR := """
UPDATE test_entity_risk_markers cbm
SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
FROM (
WITH
mo_types AS (
SELECT
DISTINCT risk_marker_type_id
FROM ?
),
to_close AS (
SELECT
cbm.*
FROM test_entity_risk_markers cbm
INNER JOIN mo_types
ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
LEFT OUTER JOIN ? mo
ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE mo.entity_id IS NULL
AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
)
SELECT * FROM to_close
) tc
WHERE cbm.mark_event_id = tc.mark_event_id;
INSERT INTO test_entity_risk_markers(
entity_id,
risk_marker_type_id
) SELECT
mo.*
FROM ? mo
LEFT JOIN test_entity_risk_markers cbm
ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
OR cbm.mark_event_end_date IS NULL;
""";
res := (EXECUTE IMMEDIATE stmt USING (tabref, tabref, tabref));
RETURN 'Table updated.';
END;
$$;
SQL compilation error:
Object name '"
UPDATE test_entity_risk_markers cbm
SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
FROM (
WITH
mo_types AS (
SELECT
DISTINCT risk_marker_type_id
FROM ?
),
to_close AS (
SELECT
cbm.*
FROM test_entity_risk_markers cbm
INNER JOIN mo_types
ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
LEFT OUTER JOIN ? mo
ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE mo.entity_id IS NULL
AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
)
SELECT * FROM to_close
) tc
WHERE cbm.mark_event_id = tc.mark_event_id;
INSERT INTO test_entity_risk_markers(
entity_id,
risk_marker_type_id
) SELECT
mo.*
FROM ? mo
LEFT JOIN test_entity_risk_markers cbm
ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
OR cbm.mark_event_end_date IS NULL;
"' exceeds maximum length limit of 255 characters.
How would I construct such a helper procedure or function?
First of all Snowflake uses "
to indicate identifiers. Error message:
SQL compilation error: Object name exceeds maximum length limit of 255 characters.
LET stmt VARCHAR := """
UPDATE...
""";
should rather be:
LET stmt VARCHAR := '
UPDATE...
';
Second FROM ? mo
is incorrect. If the idea is to parametrize the query with table name then it should use IDENTIFIER:
FROM ? mo
=>
FROM IDENTIFIER(?) AS mo
Third, EXECUTE IMMEDIATE
when provided multiple statements will error out:
Multiple SQL statements in a single API call are not supported; use one API call per statement instead.
After applying above comments:
CREATE OR REPLACE PROCEDURE update_master_entity_risk_markers(tabref VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
DECLARE
res RESULTSET;
res2 RESULTSET;
BEGIN
LET stmt VARCHAR := $$
UPDATE test_entity_risk_markers cbm
SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
FROM (
WITH
mo_types AS (
SELECT
DISTINCT risk_marker_type_id
FROM IDENTIFIER(?)
),
to_close AS (
SELECT
cbm.*
FROM test_entity_risk_markers cbm
INNER JOIN mo_types
ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
LEFT OUTER JOIN IDENTIFIER(?) mo
ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE mo.entity_id IS NULL
AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
)
SELECT * FROM to_close
) tc
WHERE cbm.mark_event_id = tc.mark_event_id;
$$;
LET stmt2 VARCHAR := $$
INSERT INTO test_entity_risk_markers(
entity_id,
risk_marker_type_id
) SELECT
mo.*
FROM IDENTIFIER(?) mo
LEFT JOIN test_entity_risk_markers cbm
ON mo.entity_id = cbm.entity_id AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
OR cbm.mark_event_end_date IS NULL;
$$;
res := (EXECUTE IMMEDIATE stmt USING (tabref, tabref));
res2 := (EXECUTE IMMEDIATE stmt2 USING (tabref));
RETURN 'Table updated.';
END;
SP invocation:
CALL update_master_entity_risk_markers(tabref => 'test_marker_output');
--Table updated.
ADDENDUM:
In this example Dynamic SQL(EXECUTE IMMEDIATE
) is not required as IDENTIFIER
allows to parametrize UPDATE/INSERT
directly:
CREATE OR REPLACE PROCEDURE update_master_entity_risk_markers(tabref VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
BEGIN
UPDATE test_entity_risk_markers cbm
SET cbm.mark_event_end_date = CURRENT_TIMESTAMP()
FROM (
WITH
mo_types AS (
SELECT DISTINCT risk_marker_type_id
FROM IDENTIFIER(:tabref)
),
to_close AS (
SELECT
cbm.*
FROM test_entity_risk_markers cbm
INNER JOIN mo_types
ON mo_types.risk_marker_type_id = cbm.risk_marker_type_id
LEFT OUTER JOIN IDENTIFIER(:tabref) mo
ON mo.entity_id = cbm.entity_id
AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE mo.entity_id IS NULL
AND cbm.mark_event_end_date = TO_DATE('9999-12-31')
)
SELECT * FROM to_close
) tc
WHERE cbm.mark_event_id = tc.mark_event_id;
INSERT INTO test_entity_risk_markers(
entity_id,
risk_marker_type_id
) SELECT
mo.*
FROM IDENTIFIER(:tabref) mo
LEFT JOIN test_entity_risk_markers cbm
ON mo.entity_id = cbm.entity_id
AND mo.risk_marker_type_id = cbm.risk_marker_type_id
WHERE cbm.mark_event_end_date < TO_DATE('9999-12-31')
OR cbm.mark_event_end_date IS NULL;
RETURN 'Table updated.';
END;
CALL update_master_entity_risk_markers(tabref => 'test_marker_output');
--Table updated.
Output: