[SOLVED] Postgres UNACCENT for character with more than 1 diacritic

Postgres UNACCENT for character with more than 1 diacritic

The UNACCENT function can strip diacritics off characters. However, in my case, it can only strip characters with 1 diacritic, for e.g.

Thành
Supermän
äää

For characters with more than 1 diacritics, UNACCENT does nothing, for e.g.

Hồ
ầ
phố

Is there a way to let Postgres strip the accents from these characters?

Thanks

Solution

PostgreSQL's unaccent module does not use Unicode normalization, but only a simple search-and-replace dictionary. The default dictionary, unaccent.rules, does not contain these Vietnamese characters, thus nothing is done.

You could create your own unaccent dictionary though. As explained in the documentation:

Create a text file vietnamese.rules with content like
```
ầ  a
Ầ  A
ồ  o
Ồ  O
```
Move vietnamese.rules into the folder $SHAREDIR/tsearch_data/ (usually /usr/share/postgresql/tsearch_data)

Run the function as

SELECT unaccent('vietnamese', 'Hồ ầ phố');
--              ^~~~~~~~~~~~~