I'm trying to develop an application within Android Studio on Windows 10.
PROBLEM: The following string array of Thai words:
String[] myTHarr = {"มาก","เชี่ยว","แน่","ม่อน","บ้าน","พูด","เลื่อย","เมื่อ","ช่ำ","แร่"};
...when processed by the following for-each loop:
for (String s:myTHarr){
//s = มา� before executing any of the below code:
byte[] utf8EncodedThaiArr = s.getBytes("UTF-8");
String utf8EncodedThai = new String(utf8EncodedThaiArr); //setting breakpoint here
// s is still มาà¸� (I want it to be มาก)
//do stuff
}
results in s = มา� when attempting to process the first word (none of the other words work either, but that's expected given the first fails).
The Thai script appears in the string array correctly (the declaration was copied straight from Android Studio), the file encoding is set to UTF-8 for the java file (per here), and the File Encoding Settings look like this (per here):
As several in the comments pointed out the problem had to be within my environment. After a bit more searching I found I should have rebuilt the project after changing the encodings (so merely switching to UTF8 and clicking 'Apply'/'OK' wasn't enough). I should note here that my File Encoding settings look like this, for reference:
Once I rebuilt, I started getting the compiler error "unmappable character for encoding cp1252" on the String array containing the Thai (side note: Some of the Thai characters were fine, others rendered as � and friends. I would have thought either all of the Thai would work or none of it, but was surprised to see even common Thai letters such as ก cause the compiler to choke).
That error led to this post in which I tried a few things to set the compiler options to UTF8. Since my application happens to be a sort of 'pre-process' for an android app, and is therefore separate from the app itself (if that makes any sense), I didn't have the luxury of using the compilerOptions attribute as the answers in the aforementioned SO post recommended (though I have since added it to the gradle on the android app side). This led me to setting the environment variable JAVA_TOOLS_OPTIONS via powershell:
setx JAVA_TOOLS_OPTIONS "-Dfile.encoding=UTF8"
Which fixed the issue!