I have the following Java snippet:
System.out.print("What is the first name of the Hungarian poet Petőfi? ");
String correctAnswer = "Sándor";
Scanner sc = new Scanner(System.in);
String answer = sc.next();
sc.close();
if (correctAnswer.equals(answer)) {
System.out.println("Correct!");
} else {
System.out.println("The answer (" + answer + ") is incorrect, the correct answer is " + correctAnswer);
}
This works fine in Eclipse, but does not work in Windows terminal: even though I enter the correct answer Sándor
, the comparison fails. This is how it looks like in Eclipse:
What is the first name of the Hungarian poet Petőfi? Sándor
Correct!
The same from command line:
What is the first name of the Hungarian poet Petőfi? Sándor
The answer (S?ndor) is incorrect, the correct answer is Sándor
What I tried without success are the following:
CHCP 65001
(to change code page to UTF-8): this is needed only if the word Petőfi
is incorrectly displayed, but does not help the input.[Console]::InputEncoding = [Console]::OutputEncoding = New-Object System.Text.UTF8Encoding
StandardCharsets.UTF_8
or "UTF-8"
to Scanner
.InputStreamReader
(with and without passing the encoding) instead of Scanner
.-Dfile.encoding=UTF-8
command line parameter.System.setProperty("file.encoding", "UTF-8");
cmd
I double-checked: the encoding of the Java source file is UTF-8.
When converting to bytes (Arrays.toString(input.getBytes())
) I experience the following:
[83, -17, -65, -67, 110, 100, 111, 114]
.[83, 0, 110, 100, 111, 114]
.[83, -61, -95, 110, 100, 111, 114]
Sándor
is actually encoded in Java like this: [83, -61, -95, 110, 100, 111, 114]
So to narrow down to the letter á
we have the following:
C3 A1
which is "Latin Small Letter A With Acute" according to https://www.charset.org/utf-8.EF BF BD
which is the � "Replacement Character" according to https://www.charset.org/utf-8/66.CHCP 65001
change it is encoded as 0.It works in Git Bash, but the letter ő (and all the other accented characters, not just this one) is incorrectly displayed in that terminal:
What is the first name of the Hungarian poet Pet▒fi? Sándor
Correct!
It is strange that even the comparison works, and entering the accented characters looks fine, repeated displaying the same does not work:
What is the first name of the Hungarian poet Pet▒fi? Péter
The answer (P▒ter) is incorrect, the correct answer is S▒ndor
The following helped in Windows terminal:
Console console = System.console();
String answer = console.readLine();
But this does not work in Eclipse:
What is the first name of the Hungarian poet Petőfi? Sándor
The answer (Sándor) is incorrect, the correct answer is Sándor
UPDATE: it seems it depends on the system settings. I have 2 laptops, one of Hungarian and the other of English settings.
System.console()
in terminal, and the new Scanner(System.in)
works in Eclipse. However, in Eclipse it works incorrectly, even if I change the encoding in Window -> Preferences -> General -> Workspace.ő
) is incorrect or the comparison fails. And when trying to use the System.console()
approach, then it throws NullPointerException being the console null. (That Eclipse version is 2023-12; I did not try the latest one there.)My Java version is 22.0.2, but the problem does not seem version-specific.
As a cross-check I tried the same in Python, and it works fine both in the Windows terminal and also in IDE without any problem:
answer = input('What is the first name of the Hungarian poet Petőfi? ')
correct_answer = 'Sándor'
if answer == correct_answer:
print('Correct')
else:
print('The answer (' + answer + ') is incorrect, the correct answer is ' + correct_answer)
So my question is: how to make it work? Is there an universal solution which works in both Windows terminal and Eclipse?
Scanner scanner = new Scanner(System.in, System.out.charset());
This solution works with Java 18+. This works both in Eclipse with default settings and in Windows command prompt having code page 852. Checking the code page:
chcp
Changing it to 852:
chcp 852
Thanks for everyone who helped reaching the solution!