javaencodingutf-8compilationcharacter-encoding

UTF-8 does not print characters to the console


I have the following code

public class MainDefault {
        public static void main (String[] args) {
                System.out.println("²³");
                System.out.println(Arrays.toString("²³".getBytes()));
        }
}

But can't seem to print the special characters to the console

When I do the following, I get the following result

$ javac MainDefault.java
$ java MainDefault

MainDefaultPrinting

On the other hand, when I compile it and run it like this

$ javac -encoding UTF8 MainDefault.java
$ java MainDefault

MainDefaultUTF8CompilationOnly

And when I run it using the file encoding UTF8 flag, I get the following

$ java -Dfile.encoding=UTF8 MainDefault

MainDefaultUTF8CompilationAndRun

It's doesn't seem to be a problem with the console (Git Bash on Windows 10), as it prints the characters normally

Echo

Thanks for your help


Solution

  • Your code are not printing the right characters in the console because your Java program and the console are using different character sets, different encodings.

    If you want to obtain the same characters, you first need to determine which character sets are in place.

    This process will depend on the "console" in which you are outputting your results.

    If you are working with Windows and cmd, as @RickJames suggested, you can use the chcp command to determine the active code page.

    Oracle provides the Java full supported encodings information, and the correspondence with other alias - code pages in this case - in this page.

    This stackoverflow answer also provides some guidance about the mapping between Windows Code Pages and Java charsets.

    As you can see in the provided links, the code page for UTF-8 is 65001.

    If you are using Git Bash (MinTTY), you can follow @kriegaex instructions to verify or configure UTF-8 as the terminal emulator encoding.

    Linux and UNIX, or UNIX derived systems like Mac OS, do not use code page identifiers, but locales. The locale information can vary between systems, but you can either use the locale command or try to inspect the LC_* system variables to find the required information.

    This is the output of the locale command in my system:

    LANG="es_ES.UTF-8"
    LC_COLLATE="es_ES.UTF-8"
    LC_CTYPE="es_ES.UTF-8"
    LC_MESSAGES="es_ES.UTF-8"
    LC_MONETARY="es_ES.UTF-8"
    LC_NUMERIC="es_ES.UTF-8"
    LC_TIME="es_ES.UTF-8"
    LC_ALL=
    

    Once you know this information, you need to run your Java program with the file.encoding VM option corresponding to the right charset:

    java -Dfile.encoding=UTF8 MainDefault
    

    Some classes, like PrintStream or PrintWriter, allows you to indicate the Charset in which the information will be outputted.

    The -encoding javac option only allows you to specify the character encoding used by source files.

    If you are using Windows with Git Bash, consider also reading this @rmunge answer: it provides information about a possible bug in the tool that may be the reason for the problem and that prevents the terminal from running correctly out of the box without the need for manual encoding adjustments.