The UNACCENT
function can strip diacritics off characters. However, in my case, it can only strip characters with 1 diacritic, for e.g.
For characters with more than 1 diacritics, UNACCENT
does nothing, for e.g.
Is there a way to let Postgres strip the accents from these characters?
Thanks
PostgreSQL's unaccent
module does not use Unicode normalization, but only a simple search-and-replace dictionary. The default dictionary, unaccent.rules
, does not contain these Vietnamese characters, thus nothing is done.
You could create your own unaccent dictionary though. As explained in the documentation:
Create a text file vietnamese.rules
with content like
ầ a
Ầ A
ồ o
Ồ O
Move vietnamese.rules
into the folder $SHAREDIR/tsearch_data/
(usually /usr/share/postgresql/tsearch_data
)
Run the function as
SELECT unaccent('vietnamese', 'Hồ ầ phố');
-- ^~~~~~~~~~~~~