**THIS ERROR LOOKS LIKE IT IS A BUG IN UNITY. THE CODE SEEMS TO WORK FINE OUTSIDE OF TABLETOP SIMULATOR (THE GAME I AM MODDING)
I'm marking this as solved but leaving it for the mods to remove if needed, as the code might still be useful to other people googling. **
I'm trying to process a large string of a few lines.. and would like to have all the accented characters it find converted into standard characters. I have some code I got form the net for this but there is a small bug in the code and I do not understand how it works, so need some help on this issue if you are able.
function stripChars(str)
local tableAccents = {}
tableAccents["à"] = "a"
tableAccents["á"] = "a"
tableAccents["â"] = "a"
tableAccents["ã"] = "a"
tableAccents["ä"] = "a"
tableAccents["ç"] = "c"
tableAccents["è"] = "e"
tableAccents["é"] = "e"
tableAccents["ê"] = "e"
tableAccents["ë"] = "e"
tableAccents["ì"] = "i"
tableAccents["í"] = "i"
tableAccents["î"] = "i"
tableAccents["ï"] = "i"
tableAccents["ñ"] = "n"
tableAccents["ò"] = "o"
tableAccents["ó"] = "o"
tableAccents["ô"] = "o"
tableAccents["õ"] = "o"
tableAccents["ö"] = "o"
tableAccents["ù"] = "u"
tableAccents["ú"] = "u"
tableAccents["û"] = "u"
tableAccents["ü"] = "u"
tableAccents["ý"] = "y"
tableAccents["ÿ"] = "y"
tableAccents["À"] = "A"
tableAccents["Á"] = "A"
tableAccents["Â"] = "A"
tableAccents["Ã"] = "A"
tableAccents["Ä"] = "A"
tableAccents["Ç"] = "C"
tableAccents["È"] = "E"
tableAccents["É"] = "E"
tableAccents["Ê"] = "E"
tableAccents["Ë"] = "E"
tableAccents["Ì"] = "I"
tableAccents["Í"] = "I"
tableAccents["Î"] = "I"
tableAccents["Ï"] = "I"
tableAccents["Ñ"] = "N"
tableAccents["Ò"] = "O"
tableAccents["Ó"] = "O"
tableAccents["Ô"] = "O"
tableAccents["Õ"] = "O"
tableAccents["Ö"] = "O"
tableAccents["Ù"] = "U"
tableAccents["Ú"] = "U"
tableAccents["Û"] = "U"
tableAccents["Ü"] = "U"
tableAccents["Ý"] = "Y"
local normalizedString = ''
for strChar in string.gmatch(str, "([%z\1-\127\194-\244][\128-\191]*)") do
if tableAccents[strChar] ~= nil then
normalizedString = normalizedString..tableAccents[strChar]
else
normalizedString = normalizedString..strChar
end
end
return normalizedString
end
This code seems to work really well, but it doesn't work for the u type chars... so...
local test = "ù, ú, û, ü"
print(stripChars(test)) -- Prints (,,,)
test = "à, á, â, ã, ä"
print(stripChars(test)) -- Prints (a, a, a, a, a)
Any ideas?.. I assume it is something to do with the pattern thing.. but I do not see how exactly it works in the 1st place. (see the bottom of the code block under the large table of characters)
I don't know why the function would work on "à, á, â, ã, ä"
but would delete characters when used on "ù, ú, û, ü"
. The function assumes that both strings are encoded in UTF-8. Perhaps it is an encoding issue, but then I would expect it to fail in both cases. For me, calling the function on "ù, ú, û, ü"
gives "u, u, u, u"
, as expected.
As Curtis F says, it might help to call print(string.byte(test, 1, -1))
on the string that is failing to find out how it is being encoded. I have the file encoded in UTF-8, so the values printed are 195 185 44 32 195 186 44 32 195 187 44 32 195 188
.
How the function works is that "[%z\1-\127\194-\244][\128-\191]*"
is a pattern that matches a single character (codepoint) encoded in the UTF-8 encoding. Each codepoint takes 1 to 4 bytes. The pattern, for instance, matches the single byte used to encode the comma character (","
is "\44"
) or the two two bytes that are used to encode the accented letters ("ù"
is "\195\185"
). The for-loop looks up each character in the tableAccents
table, where the keys are accented letters and the values are the corresponding unaccented ones (tableAccents["ù"]
→ "u"
). If the character is a key in the table, the value for that key is added to the normalizedString
. If the character is not a key in the table, it is added without being changed. Thus the accented letters are replaced with unaccented ones, while other characters are left alone.
This is just a code cleanup suggestion: the for-loop could be simplified by using string.gsub
:
local normalizedString = str:gsub("[%z\1-\127\194-\244][\128-\191]*", tableAccents)