sqldiacriticsaccent-insensitivesnowflake-cloud-data-platform

Remove Diacritics from string in Snowflake


I would like to remove all diacritics from a string column, using Snowflake SQL. For example: ä, ö, é, č => a, o e, c

I acknowledged that the TRANSLATE function would work but only for single case while there are lots of letters with an accent that need to be translated.

Could you please give me some hints? Thanks so much in advance, Hanh


Solution

  • Perhaps the safest way to make sure it covers all of them is to draft on the work in ES2015/ES6 to cover all characters like this:

    create or replace function REPLACE_DIACRITICS("str" string)
    returns string
    language javascript
    strict immutable
    as
    $$
        return str.normalize("NFD").replace(/\p{Diacritic}/gu, "");
    $$;
    
    select REPLACE_DIACRITICS('ö, é, č => a, o e, c');
    

    JS for the UDF is courtesy of this post: Remove accents/diacritics in a string in JavaScript