sqlregexsnowflake-cloud-data-platformregexp-substr

REGEXP_SUBSTR in Snowflake, regex that will extract the string after the period (which occurs between 0-2 times)


I'm trying to use the REGEXP_SUBSTR method in Snowflake to extract a string after the period. I have the following possibilities:

With the following possibilities, I need to be able to extract TABLE_NAME. So far, I've tried this which is able to get the third possibility:

select regexp_substr(query_text, 'ALTER TABLE ([0-9A-Za-z\\-\\_]+) ADD COLUMN', 1,1, 'e', 1);

I'm not sure how I could capture all three possibilities in one regular expression. Would appreciate any pointers. Thank you.


Solution

  • Using regexp and split_part:

    WITH cte(query_text) AS (
       SELECT 'ALTER TABLE DATABASE_NAME.SCHEMA_NAME.TABLE_NAME ADD COLUMN ....' UNION
       SELECT 'ALTER TABLE SCHEMA_NAME.TABLE_NAME ADD COLUMN ....' UNION
       SELECT 'ALTER TABLE TABLE_NAME ADD COLUMN ....'
    )
    select 
      regexp_substr(query_text, 'ALTER TABLE (.+) ADD COLUMN', 1,1, 'e', 1) as obj_name,
      split_part(obj_name, '.', -1) AS tab_name,
      split_part(obj_name, '.', -2) AS schema_name,
      split_part(obj_name, '.', -3) AS db_name
    FROM cte;
    

    Output:

    OBJ_NAME TAB_NAME SCHEMA_NAME DB_NAME
    DATABASE_NAME.SCHEMA_NAME.TABLE_NAME TABLE_NAME SCHEMA_NAME DATABASE_NAME
    SCHEMA_NAME.TABLE_NAME TABLE_NAME SCHEMA_NAME
    TABLE_NAME TABLE_NAME