windowsunicodeencodingutf-16ucs2

Why does Windows use ANSI Code page instead of UNICODE?


When I run the command chcp in a cmd.exe window, it represents the code page used in Windows.

I think Windows uses the UNICODE character set.

So, my questions are:

  1. Why does Windows use ANSI codepages instead of Unicode?

  2. Windows uses UTF-16 or UCS-2? Can I check this (by command or MSDN link)?

  3. UTF-16 or UCS-2 is just an encoding? or is also a character set?

  4. UTF-8, UTF-16, UTF-32, etc .. do they have different character set size?

I'm so confused. please somebody define them.


Solution

    1. Historical reasons, and backwards compatibility. Windows itself is a Unicode-based OS, and has been since the NT days. But many legacy (and even current) apps are not written for Unicode. Unicode-enabled apps do not use ANSI codepages, unless they need to convert runtime data between ANSI and Unicode.

    2. Microsoft switched to UTF-16 in Windows 2000. Before that, it used UCS-2. See Unicode in Microsoft Windows.

    3. Both UTF-16 and UCS-2 are just encodings of the same Unicode character set. UTF-16 was invented to support encoding codepoints above U+FFFF, which UCS-2 cannot handle.

    4. All UTFs (including many you haven't named) are just encodings of the same Unicode character set. The number specified in the name is the number of bits used in encoded codeunits (UTF-8 uses 8bit codeunits, UTF-16 uses 16bit codeunits, etc).