snowflake-cloud-data-platformdata-processingpascalcasing

How to split pascal case address in SnowFlake?


In order to create PlaceKey for addresses to link some of my tables, I need to split an address column in SnowFlake.

I am not familiar with JavaScript, but I tried Javascript UDF in SnowFlake. Then I don't know how to deal with the addresses like '123_45ThSt'.

The output of my function is like '123_45 Th St'. I am stuck here.

The expected output is '123 45Th St'. Hope someone could help me out. Much appreciated!

Below is another example and my SnowFlake SQL code:

 Original address column: 12345NE17ThSt

 The expected column:     12345 NE 17Th St

 My function's output:    12345 NE17 ST

My function:

CREATE OR REPLACE FUNCTION Split_On_Upper_Case(s string)
 RETURNS string
 LANGUAGE JAVASCRIPT 
 AS '
 function Split_On_Upper_Case(str){
 str=str.split(/(?=[A-Z])/).join(" ")
 return str
 }

 // Now call the function
 return Split_On_Upper_Case(S);
 '
 ;

Solution

  • Assuming the format of street address, which includes number + word (ends with lower case or number) + word (start with upper case), I have below solution:

    CREATE OR REPLACE FUNCTION Split_On_Upper_Case(s string)
    RETURNS string
    LANGUAGE JAVASCRIPT 
    AS $$
      regexp = /([0-9]+)(NE|SE|NW|SW)?(.*[0-9a-z]{1})([A-Z][a-zA-Z0-9]+)/g;
      splits = regexp.exec(S.replace(/_/g, " "));
      if (splits && splits.length == 5) {
         return 
            splits[1].trim() + " " + 
            (splits[2] ? splits[2].trim() + " ": "" ) + 
            splits[3].trim() + " " + 
            splits[4].trim();
      }
      
      return "not found" // or whatever you want to do
    $$;
    

    Then try to run the function:

    select Split_On_Upper_Case('12345NE17ThSt');
    -- 12345 NE 17Th St
    
    select Split_On_Upper_Case('123_45ThSt');
    -- 123 45Th St
    
    select Split_On_Upper_Case('35TestSt');
    -- 35 Test St
    

    It returns expected output, but if you have more sample inputs, they can help to validate.