postgresqlunicodeunaccent

Postgres unaccent function for character


I'm using unaccent in Postgres but it cannot convert special character like: ù : ù
but it's okay for ù: ù
2 characters same meaning but different code, the first one is character u + ̀
How I can solve this problem ? Thank you so much.


Solution

  • Your problem is unicode normalization, what PostgreSQL does not do, unfortunately. And it's not so simple to implement on your own.

    But, because you only want to remove diacritical marks, you only need to actually remove code-points (before or after calling the unaccent() function) which are unicode combining characters:

    select regexp_replace(
      'ùù',
      '[\u0300-\u036F\u1AB0-\u1AFF\u1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]',
      '',
      'g'
    )
    

    should do the trick.