In my data there is a field that has multiple emails that are comma separated. I am using FLATTEN to separate the emails into their own record keeping all other fields the same. I need to be able to create a new field that will assign an id to each email that was separated by the FLATTEN function.
The FLATTEN portion is working and I'm getting the emails split out but I don't know how to create the new id field. I'm working in Snowflake.
Example
Source Record:
ID | EMAIL_TO |
---|---|
1 | email1,email2,email3 |
2 | email4,email5 |
What I need:
ID | EMAIL_TO | NEW_DERIVED_ID |
---|---|---|
1 | email1 | 1 |
1 | email2 | 2 |
1 | email3 | 3 |
2 | email4 | 1 |
2 | email5 | 2 |
CREATE OR REPLACE VIEW MYVIEW (
ID,
EMAIL_TO,
NEW_DERIVED_ID_FOR_EACH_SPLIT_EMAIL_TO_FIELD
) as (
SELECT
ID,
collate(cast(replace(value, '"')as varchar),'en-ci-trim') as EMAIL_TO,
NEW_DERIVED_ID_FOR_EACH_SPLIT_EMAIL_TO_FIELD
FROM MYSOURCE as CM,
LATERAL FLATTEN(input=>split(CM.EMAIL_TO, ',')) as EMAIL_TO
where EMAIL_TO is not null
);
Snowflake Flatten returns several fields as output: SEQ, KEY, PATH, INDEX, VALUE & THIS
INDEX should provide you with what you need to generate an ID field without having to generate one manually with something like row_number()
. Assuming you want it sequenced from 1, you just need to add 1 to the INDEX:
create table emails as
select ID, EMAIL_TO
from (values (1, 'email1,email2,email3' )) as eml (ID, EMAIL_TO);
CREATE OR REPLACE VIEW MYVIEW as
SELECT
ID,
collate(cast(replace(value, '"')as varchar),'en-ci-trim') as EMAIL_TO,
INDEX+1 NEW_DERIVED_ID_FOR_EACH_SPLIT_EMAIL_TO_FIELD
FROM emails as CM,
LATERAL FLATTEN(input=>split(CM.EMAIL_TO, ',')) as EMAIL_TO
where EMAIL_TO is not null
;
select * from myview;