bashshellencodingcommand-lineutility

Convert text to bytes from Bash shell?


How can a text string be turned into UTF-8 encoded bytes using Bash and/or common Linux command line utilities? For example, in Python one would do:

"Six of one, ½ dozen of the other".encode('utf-8')
b'Six of one, \xc2\xbd dozen of the other'

Is there a way to do this in pure Bash:

STR="Six of one, ½ dozen of the other"
<utility_or_bash_command_here> --encoding='utf-8' $STR
'Six of one, \xc2\xbd dozen of the other'

Solution

  • Perl to the rescue!

    echo "$STR" | perl -pe 's/([^x\0-\x7f])/"\\x" . sprintf "%x", ord $1/ge'
    

    The /e modifier allows to include code into the replacement part of the s/// substitution, which in this case converts ord to hex via sprintf.