linuxshelltr

Why is `tr` replacing one character with two?


I am using tr (GNU coreutils v8.32) to transliterate non-basic-Latin characters into basic Latin, and it is replacing them with characters I didn't tell it to or more than one of the desired character.

Example:

% echo é | tr é e
> ee

What's going on?


Solution

  • I think the issue is that tr is oriented to the transliteration of single bytes, but if you look at your é, you will see it is two bytes, plus a linefeed:

    echo é | xxd                                         
    00000000: c3a9 0a                                  ...
    

    I think you need to look to sed which is oriented towards patterns, however long they may be:

    echo éàéà | sed -e 's/é/elephant/g' -e 's/à/antelope/g'
    elephantantelopeelephantantelope