javaencodingcharacter-encodingjvm-argumentsvmargs

what is the character encoding used in eclipse vm arguement?


We read an important parameter as vm argument and it is a path to a file. Now, users are using vm argument with some korean characters (folders have been named with korean characters) and the program started to break since the korean characters are read as question marks! The below experiment shows the technical situation.

I tried to debug a program in eclipse and in "Debug Configurations" under "arguments" tab in "VM arguments", I gave the input like this

-Dfilepath=D:\XXXX\카운터

But when I read it from the program like this

String filepath = System.getProperty("filepath");

I get the output with question marks like below.

D:\XXXX\???

I understand that eclipse debug GUI uses the right encoding (?) to display the right characters, But when the value is read in program it uses different encoding which is not able to read the characters properly.

what is the default encoding does java uses to read vm arguments supplied to it?

How to change the encoding in eclipse so that the program reads the characters properly ?


Solution

  • My conclusion is the conversion depended on default encoding(Windows setting "Language for non-Unicode programs") Here is the program for testing:

    package test;
    import java.io.FileOutputStream;
    public class Test {
    public static void main(String[] args) throws Exception {
        StringBuilder sb = new StringBuilder();
        sb.append("[카운터] sysprop=[").append(System.getProperty("cenv"));
        if (args.length > 0) {
            sb.append("], cmd args=[").append(args[0]);
        }
        sb.append("], file.encoding=").append(System.getProperty("file.encoding"));
        FileOutputStream fout = new FileOutputStream("/testout");
        fout.write(sb.toString().getBytes("UTF-8"));
        fout.close();//write result to a file instead of System.out
        //Thread.sleep(10000);//For checking arguments using Process Explorer
    }
    }
    

    Test1: "Language for non-Unicode programs" is Korean(Korea)

    Exceute in command prompt: java -Dcenv=카운터 test.Test 카운터(Korean chars are correct when I verify the arguments using Process Explorer)

    Result: [카운터] sysprop=[카운터], cmd args=[카운터], file.encoding=MS949

    Test2: "Language for non-Unicode programs" is Chinese(Traditional, Taiwan)

    Exceute in command prompt(paste from clipboard): java -Dcenv=카운터 test.Test 카운터(I cannot see Korean chars in command windows. However, Korean chars are correct when I verify the arguments using Process Explorer)

    Result: [카운터] sysprop=[???], cmd args=[???], file.encoding=MS950

    Test3: "Language for non-Unicode programs" is Chinese(Traditional, Taiwan)

    Launch from Eclipse by setting Program arguments and VM arguments (The command line in Process Explorer is C:\pg\jdk160\bin\javaw.exe -agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:50672 -Dcenv=카운터 -Dfile.encoding=UTF-8 -classpath S:\ws\wtest\bin test.Test 카운터 This is the same as you see in the Properties dialog of Eclipse Debug view)

    Result: [카운터] sysprop=[???], cmd args=[bin], file.encoding=UTF-8

    Change the Korean chars to "碁石",which exist in MS950/MS949 charset:

    Change the Korean chars to "鈥焢",which exist in MS950 charset:

    Change the Korean chars to "宽广",which exist in GBK charset:

    During testing, I always check the command line via Process Explorer, and make sure all chars are correct. However, the command argument chars are converted using default encoding before invoke main(String[] args) of Java class. If one of char does not exist in the charset of default encoding, the program will get unexpected argument.

    I'm not sure the problem is caused by java.exe/javaw.exe or Windows. But passing non-ASCII parameter via command arguments is not a good idea.

    BTW, I also try to execute the command via .bat file(file encoding is UTF-8). Maybe someone is interest,

    Test5: "Language for non-Unicode programs" is Korean(Korea)

    The command line in Process Explorer is java -Dcenv=移댁슫?? test.Test 移댁슫??(The Korean chars are collapsed)

    Result: [카운터] sysprop=[移댁슫??], cmd args=[移댁슫??], file.encoding=MS949

    Test6: "Language for non-Unicode programs" is Korean(Korea)

    Add another VM arguments. The command line in Process Explorer is java -Dfile.encoding=UTF-8 -Dcenv=移댁슫?? test.Test 移댁슫??(The Korean chars are collapsed)

    Result: [카운터] sysprop=[移댁슫??], cmd args=[移댁슫??], file.encoding=UTF-8

    Test7: "Language for non-Unicode programs" is Chinese(Traditional, Taiwan)

    The command line in Process Explorer is java -cp s:\ws\wtest\bin -Dcenv=儦渥?? test.Test 儦渥??(The Korean chars are collapsed)

    Result: [카운터] sysprop=[儦渥??], cmd args=[儦渥??], file.encoding=MS950