javalexical

How Java execute the lexical translation?


In the Jave Spec, I read that

A translation of Unicode escapes (§3.3) in the raw stream of Unicode characters to the corresponding Unicode character. A Unicode escape of the form \uxxxx, where xxxx is a hexadecimal value, represents the UTF-16 code unit whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.here

It means the lexical translation is only applied for ASCII character? Because when I tried to write a code with Cyrillic, Hebrew, or Kanji character, there are no compile-time error even though these characters are not ASCII?

I don't understand why? Can anyone help me to understand


Solution

  • The quote doesn't say anything about what happens if you write a program containing a Cyrillic/Hebrew letter. In fact, the section just before the one you quoted says:

    3.1 Unicode

    Programs are written using the Unicode character set.

    Note that "allows" here means that this translation step adds a new capability to Java. When you are allowed to do something, you can, but are not required to do it.

    The quote merely says that the lexical translator will turn anything of the form \uxxxx to the corresponding Unicode character U+xxxx.

    The natural consequence of this is that, you can write a program containing any Unicode code point (i.e. "any program") using only an ASCII keyboard. How? Whenever you need to write some non-ASCII character, just write its Unicode escape.

    As a concrete example:

    These are valid Java statements:

    int Д = 0;
    System.out.println("Д");
    

    But let's say my text editor can only handle ASCII text, or that I only have a US keyboard, so I can't type "Д". The language spec says that I can still write this in ASCII, like this:

    int \u0414 = 0;
    System.out.println("\u0414");
    

    It will do exactly the same thing.