postgresqldiacriticsunaccent

Postgres UNACCENT for character with more than 1 diacritic


The UNACCENT function can strip diacritics off characters. However, in my case, it can only strip characters with 1 diacritic, for e.g.

For characters with more than 1 diacritics, UNACCENT does nothing, for e.g.

Is there a way to let Postgres strip the accents from these characters?

Thanks


Solution

  • PostgreSQL's unaccent module does not use Unicode normalization, but only a simple search-and-replace dictionary. The default dictionary, unaccent.rules, does not contain these Vietnamese characters, thus nothing is done.

    You could create your own unaccent dictionary though. As explained in the documentation:

    1. Create a text file vietnamese.rules with content like

      ầ  a
      Ầ  A
      ồ  o
      Ồ  O
      
    2. Move vietnamese.rules into the folder $SHAREDIR/tsearch_data/ (usually /usr/share/postgresql/tsearch_data)

    3. Run the function as

      SELECT unaccent('vietnamese', 'Hồ ầ phố');
      --              ^~~~~~~~~~~~~