regexalteryx

Regex parse with alteryx


One of the columns has the data as below and I only need the suburb name, not the state or postcode.

I'm using Alteryx and tried regex (\<\w+\>)\s\<\w+\> but only get a few records to the new column.

Input:

CABRAMATTA          
CANLEY HEIGHTS      
ST JOHNS PARK       
Parramatta NSW 2150 
Claymore 2559       
CASULA
  

Output

CABRAMATTA          
CANLEY HEIGHTS      
ST JOHNS PARK       
Parramatta
Claymore
CASULA        

Solution

  • This regex matches all letter-words up to but not including an Australian state abbreviation (since the addresses are clearly Australian):

    ( ?(?!(VIC|NSW|QLD|TAS|SA|WA|ACT|NT)\b)\b[a-zA-Z]+)+
    

    See demo

    The negative look ahead includes a word boundary to allow suburbs that start with a state abbreviation (see demo).