I want to convert characters to numerical codes, so I tried string.byte("å"). However, it seems that the return value of string.byte() is 195 for these kind of characters;
any way to get a numerical code of non-ascii characters like:?
à,á,â,ã,ä,å
I'm using pure lua;
Lua thinks a string is a sequence of bytes, but a Unicode character may contain multiple bytes.
Assuming the string is has valid UTF-8 encoding, you can use the pattern "[\0-\x7F\xC2-\xF4][\x80-\xBF]*"
to match a single UTF-8 byte sequence. (In Lua 5.1, use "[%z\1-\127\194-\244][\128-\191]*"
), and then get its numerical codes:
local str = "à,á,â,ã,ä,å"
for c in str:gmatch("[\0-\x7F\xC2-\xF4][\x80-\xBF]*") do
print(c:byte(1, -1))
end
Output:
195 160
44
195 161
44
195 162
44
195 163
44
195 164
44
195 165
Note that 44
is the encoding for the comma.