sqlamazon-web-servicesamazon-athena

Trino string split > every character


I want to do some profiling of characters that I seeing in a field, and a precursor of that is splitting a string value into its component characters.

With SQL Server I would have turned to the dark-side with a cursor, but I'm using AWS Athena so don't have that option.

Is there anyway I could approach this using Trino-SQL? I've taken a look at SPLIT and REGEXP_SPLIT(), but I can't see a way to pass them either a) a blank value for the split parameter (to split on all chars) b) to devise a regex pattern (to again split on all chars) ex Tried:

select s.str as original_str, u.str as exploded_value
from
(select 'www.google.com' as str) AS s
cross join unnest(regexp_split(s.str,'\D')) as u(str)

... which splits to 15 rows returned but no value for 'exploded value'. Is that because there is no distance between the splits?

Thanks in advance for any ideas!


Solution

  • select s.str as original_str, u.str as exploded_value
    from
    (select 'www.google.com' as str) AS s
    cross join unnest(regexp_extract_all(s.str,'.')) as u(str)