sqlsnowflake-cloud-data-platform

How to assign id's to records after using FLATTEN


In my data there is a field that has multiple emails that are comma separated. I am using FLATTEN to separate the emails into their own record keeping all other fields the same. I need to be able to create a new field that will assign an id to each email that was separated by the FLATTEN function.

The FLATTEN portion is working and I'm getting the emails split out but I don't know how to create the new id field. I'm working in Snowflake.

Example

Source Record:

ID EMAIL_TO
1 email1,email2,email3
2 email4,email5

What I need:

ID EMAIL_TO NEW_DERIVED_ID
1 email1 1
1 email2 2
1 email3 3
2 email4 1
2 email5 2
CREATE OR REPLACE VIEW MYVIEW (
ID,
EMAIL_TO,
NEW_DERIVED_ID_FOR_EACH_SPLIT_EMAIL_TO_FIELD
) as (
        SELECT
        ID,
        collate(cast(replace(value, '"')as varchar),'en-ci-trim') as EMAIL_TO,
        NEW_DERIVED_ID_FOR_EACH_SPLIT_EMAIL_TO_FIELD
        FROM MYSOURCE as CM,
        LATERAL FLATTEN(input=>split(CM.EMAIL_TO, ',')) as EMAIL_TO
        where EMAIL_TO is not null
);

Solution

  • Snowflake Flatten returns several fields as output: SEQ, KEY, PATH, INDEX, VALUE & THIS

    INDEX should provide you with what you need to generate an ID field without having to generate one manually with something like row_number() . Assuming you want it sequenced from 1, you just need to add 1 to the INDEX:

    create table emails as 
    select ID,  EMAIL_TO
    from (values (1, 'email1,email2,email3' )) as eml (ID, EMAIL_TO);
    
    
    CREATE OR REPLACE VIEW MYVIEW  as 
      SELECT
        ID,
        collate(cast(replace(value, '"')as varchar),'en-ci-trim') as EMAIL_TO,
        INDEX+1 NEW_DERIVED_ID_FOR_EACH_SPLIT_EMAIL_TO_FIELD
      FROM emails as CM,
      LATERAL FLATTEN(input=>split(CM.EMAIL_TO, ',')) as EMAIL_TO
      where EMAIL_TO is not null
    ;
    select * from myview;