I am using tr
(GNU coreutils v8.32) to transliterate non-basic-Latin characters into basic Latin, and it is replacing them with characters I didn't tell it to or more than one of the desired character.
Example:
% echo é | tr é e
> ee
What's going on?
I think the issue is that tr
is oriented to the transliteration of single bytes, but if you look at your é
, you will see it is two bytes, plus a linefeed:
echo é | xxd
00000000: c3a9 0a ...
I think you need to look to sed
which is oriented towards patterns, however long they may be:
echo éàéà | sed -e 's/é/elephant/g' -e 's/à/antelope/g'
elephantantelopeelephantantelope