awklocale

changing locale (LC_ALL) for sprintf inside awk


I want to print integer values in the range of 129 to 255 to a string using sprintf("%c") and have a problem with the following statement mentioned in the "GNU Awk User's Guide":

NOTE: The POSIX standard says the first character of a string is printed. In locales with multibyte characters, gawk attempts to convert the leading bytes of the string into a valid wide character and then to print the multibyte encoding of that character. Similarly, when printing a numeric value, gawk allows the value to be within the numeric range of values that can be held in a wide character. If the conversion to multibyte encoding fails, gawk uses the low eight bits of the value as the character to print.

This leads to the following output:

[:~]$ gawk 'BEGIN {retString = sprintf("%c%c%c", 129, 130, 131); print retString}' | od -x
0000000 81c2 82c2 83c2 000a

In front of every byte (0x81, 0x82, 0x82) an extra byte (0xc2) is added. I can avoid this by setting LC_ALL to C:

[:~]$ LC_ALL=C gawk 'BEGIN {retString = sprintf("%c%c%c", 129, 130, 131); print retString}' | od -x
0000000 8281 0a83

The question is now: How can I change the locale within awk without setting LC_ALL outside the awk script? I want to use this script on multiple systems and don't want that the output depends on the default locale settings.

Or is there another way to achieve the same result without the sprintf() call?


Solution

  • You can switch gawk to byte-mode by using the -bE instead of -E in the shebang:

    #!/usr/bin/gawk -bE
    
    BEGIN { printf "%c",255 }