stored-proceduressnowflake-cloud-data-platformdatabase-cursor

Dynamic SQL and cursor iteration in Snowflake


I've been sitting all weekend, trying to figure out the implementation of cursor iterations in Snowflake procedure blocks.

The goal is to create a dynamic procedure that when called checks all available tables in my Snowflake DB schema for (non)existing account_ids. The procedure will be utilized as a data control test to check if our data deletion procedure has been run correctly.

I also want to use parameters for all queries within my declaration block so we can easily fit this procedure to different DBs that have the same tables but are stored in different schemas.

I.e: SOURCE_DATABASE.LOM_US_DB_PUBLIC vs. SOURCE_DATABASE.LOM_EU_DB_PUBLIC

Current Implementation:

The procedure checks for records associated with a specified ACCOUNT_ID across all tables in a given database schema. The procedure takes three parameters: DB_NAME, SCHEMA_NAME, and ACCOUNT_ID. It switches to the specified database and schema, iterates over all tables in the schema, and constructs a SQL query to count records with the given ACCOUNT_ID in each table. If records are found, it accumulates the results in a summary string, which is returned at the end of the procedure.

Current Problem:

I continuously run into issues when implementing either 'For Loops' or when trying to work with 'Execute Immediately' statements. I've been reading through most of the Snowflake docs but can't seem to find a solution to this problem.

I keep on receiving the same error messages:

Syntax error: unexpected 'IMMEDIATE'. (line 22)

syntax error line 22 at position 35 unexpected 'INTO'. syntax error line 25 at position 8 unexpected 'IF'. (line 22)

OR

Syntax error: unexpected 'FOR'. (line 21)

Desired Outcome:

At the end of the iteration, the procedure returns a summary that contains details of all tables that have records for the specified ACCOUNT_ID.

PS: Recommendations for return summaries are appreciated!

CREATE OR REPLACE PROCEDURE DATA_DELETION_TEST(
    DB_NAME STRING, -- Set Parameters to customize test for different DBs and Schemas
    SCHEMA_NAME STRING,
    ACCOUNT_ID STRING
)
RETURNS STRING -- Procedure returns a string as result 
LANGUAGE SQL -- Procedure is written in SQL language 
EXECUTE AS CALLER -- Procedure has the same access permissions as the user who is executing it
AS
$$ -- Start of the procedure body
DECLARE
    CURSOR_RESULT STRING DEFAULT ''; -- summarizes which tables contain data for the specified account ID.
    TABLE_NAME STRING; -- construct the SQL query dynamically for each table.
    SQL_TEXT STRING; -- SQL query string that checks for data connected to the given account ID in that specific table.
    QUERY_RESULT STRING; -- holds the count of records found in the table for the specified account ID.
BEGIN -- Start of the Procedure Body
    -- Switch to the specified database
    EXECUTE IMMEDIATE 'USE DATABASE ' || DB_NAME || ';';
    -- Switch to the specified schema
    EXECUTE IMMEDIATE 'USE SCHEMA ' || SCHEMA_NAME || ';';

    -- Iterate over the tables in the specified schema
    FOR RECORD IN (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = SCHEMA_NAME) DO
        -- RECORD.TABLE_NAME refers to the TABLE_NAME column from the cursor's result set
        TABLE_NAME := RECORD.TABLE_NAME;

        -- Construct the SQL query to check for data
        SQL_TEXT := 'SELECT COUNT(*) FROM ' || SCHEMA_NAME || '.' || TABLE_NAME || ' WHERE ACCOUNT_ID = ''' || ACCOUNT_ID || '''';

        -- Execute the query and fetch the result
        EXECUTE IMMEDIATE SQL_TEXT INTO QUERY_RESULT;

        -- If data exists, store the result
        IF QUERY_RESULT > 0 THEN
            CURSOR_RESULT := CURSOR_RESULT || 'Table: ' || TABLE_NAME || ' has ' || QUERY_RESULT || ' records for Account ID: ' || ACCOUNT_ID || '. ';
        END IF;
    END FOR;

    RETURN CURSOR_RESULT;
END;
$$; -- End of the procedure body

CALL DATA_DELETION_TEST('SOURCE_DATABASE', 'LOM_US_DB_PUBLIC', 'your_account_id');

Solution

  • I always find it confusing working with the stored procedures and fetching and assigning variables, so take the below with a pinch of salt.

    I think that is not the way you use "INTO", also I modified the "IF" clause and the way to obtain the list of table names:

    CREATE OR REPLACE PROCEDURE DATA_DELETION_TEST(
        DB_NAME STRING, -- Set Parameters to customize test for different DBs and Schemas
        SCHEMA_NAME STRING,
        ACCOUNT_ID STRING
    )
    RETURNS STRING -- Procedure returns a string as result 
    LANGUAGE SQL -- Procedure is written in SQL language 
    EXECUTE AS CALLER -- Procedure has the same access permissions as the user who is executing it
    AS
    DECLARE
        CURSOR_RESULT STRING DEFAULT ''; -- summarizes which tables contain data for the specified account ID.
        TABLE_NAME STRING; -- construct the SQL query dynamically for each table.
        SQL_TEXT STRING; -- SQL query string that checks for data connected to the given account ID in that specific table.
        QUERY_RESULT STRING; -- holds the count of records found in the table for the specified account ID.
        -- added vars:
    
        GET_TABLES_STATEMENT VARCHAR DEFAULT (
            ' select table_name from ' || :DB_NAME  || '.INFORMATION_SCHEMA.TABLES where table_schema = \'' || :SCHEMA_NAME || '\';'
        );
        TABLES_AVAILABLE RESULTSET DEFAULT (EXECUTE IMMEDIATE GET_TABLES_STATEMENT);
        CUR_TABLE CURSOR for TABLES_AVAILABLE;
        holder RESULTSET;
        
    BEGIN -- Start of the Procedure Body
        -- Switch to the specified database
        EXECUTE IMMEDIATE 'USE DATABASE ' || DB_NAME || ';';
        -- Switch to the specified schema
        EXECUTE IMMEDIATE 'USE SCHEMA ' || SCHEMA_NAME || ';';
    
        -- Iterate over the tables in the specified schema
        FOR RECORD IN CUR_TABLE DO
            -- RECORD.TABLE_NAME refers to the TABLE_NAME column from the cursor's result set
            TABLE_NAME := RECORD.TABLE_NAME;
            -- Construct the SQL query to check for data
            SQL_TEXT := 'SELECT COUNT(*) AS CNT FROM ' || SCHEMA_NAME || '.' || TABLE_NAME || ' WHERE ACCOUNT_ID = ''' || ACCOUNT_ID || '''';
    
            -- Execute the query and fetch the result
            holder := (EXECUTE IMMEDIATE SQL_TEXT);
            let c1 cursor for holder;
            open c1;
            fetch c1 into QUERY_RESULT;
                
            -- If data exists, store the result
            IF (QUERY_RESULT > 0) THEN
                CURSOR_RESULT := CURSOR_RESULT || 'Table: ' || TABLE_NAME || ' has ' || QUERY_RESULT || ' records for Account ID: ' || ACCOUNT_ID || '. ';
            END IF;
        END FOR;
    
        RETURN CURSOR_RESULT;
    END;
    

    Testing it now: Some prep to have data:

    use schema test_db.sample_procedure_dvd;
    create or replace table test_db.sample_procedure_dvd.sample_data as (
    select 'US' as account_id, 1 as id
    union all 
    select 'UK', 2
    );
    create or replace table test_db.sample_procedure_dvd.sample_data_us as 
    (select * from test_db.sample_procedure_dvd.sample_data where account_id = 'US');
    create or replace table test_db.sample_procedure_dvd.sample_data_non_us as 
    (select * from test_db.sample_procedure_dvd.sample_data where account_id != 'US');
    

    I should have now 3 tables, 2 of them have data for account_id = US, one only for UK.

    Calling the procedure:

     CALL DATA_DELETION_TEST('TEST_DB', 'SAMPLE_PROCEDURE_DVD', 'US');
    

    Returns the expected output:

    Table: SAMPLE_DATA has 1 records for Account ID: US. Table: SAMPLE_DATA_US has 1 records for Account ID: US.

    You also asked about alternatives to the "return summary" -- it may be worth it to return a table (e.g. table_name|number_hits|timestamp of the call), although then you'd need to modify your "returns")