All printable characters' hex code values can be displayed this way in bash.
printf "%x\n" \'a
61
awk 'BEGIN{printf("%x\n",\\'a)}'
awk 'BEGIN{printf("%x\n",\'a)}'
None of them can be performed in awk,is there no way to do in awk?
awk doesn't provide this kind of printf format such as in bash?
awk -v var="a" 'BEGIN{printf("%x\n", var)}'
0
echo -n a|xxd
0000000: 61
It is simple to get the a printable characters' hex code value with echo -n a|xxd,my question is to ask does awk provide this kind of printf format such as in bash or not ,not about how to get the hex code value with other method in awk.
awk -v var="a" 'BEGIN{printf("%x\n", \'var)}'
bash: syntax error near unexpected token `)'
debian8@debian:~$ awk -v var="a" "BEGIN{printf("%x\n", \'var)}"
awk: cmd. line:1: BEGIN{printf(%xn, \'var)}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1: BEGIN{printf(%xn, \'var)}
awk: cmd. line:1: ^ backslash not last character on line
awk: cmd. line:1: BEGIN{printf(%xn, \'var)}
awk: cmd. line:1: ^ syntax error
Conclusion:awk doesn't support this kind of printf format.
Here's a command that shows that awk
's printf
function indeed does not support the '
-prefixed syntax for getting a character's code point (applies to GNU Awk, Mawk, and BSD/macOS Awk):
$ awk -v char="'a" 'BEGIN { printf "%x\n", char }'
0 # Value 'a is literally interpreted as a number, which defaults to 0
Note that Bash v4+'s printf
builtin is Unicode-aware:
$ printf '%x\n' \'€
20ac # U+20AC is the Unicode code point of the EURO symbol
A hex-dump utility such as xxd
will only give you the byte representation of a character, which is only the same as the code point in the 7-bit ASCII range.
In a UTF-8-based locale (which is typical these days), anything beyond the ASCII range will print the bytes that make up the UTF-8-encoded form of the character:
$ xxd <<<€
00000000: e282 ac0a # 0xe2 0x82 0xac are the UTF-8 encoding of Unicode char. U+20AC
The ord()
function used with GNU Awk in Ed Morton's helpful answer is limited to ASCII characters. Any character with a codepoint beyond 0x7f
results in a negative value.
The create-a-map-of-all-characters workaround from James Brown's helpful answer:
is limited to ASCII characters in Mawk and BSD/macOS Awk
in principle works with all Unicode characters in GNU Awk, but the fact that a map of all characters must be built makes this somewhat impractical; here's a version that covers the Unicode BMP (basic multilingual plane), into which the most widely used characters fall.
$ gawk -v char=€ 'BEGIN{ for(n=0xffff;n>=0;n--) ord[sprintf("%c",n)]=n; printf "%x\n", ord[char]}'
20ac
Tip of the hat to RARE Kpop Manifesto, who suggested iterating over the BMP code points in descending order, "otherwise the ASCII-duplicating section in 0xD800-0xDFFF
would overwrite ASCII ordinals with these meaningless values in the UTF-16 surrogate exclusion range."
That is, with iteration in ascending order, ASCII-range characters such as char=a
as input would mistakenly yield a surrogate code point.