windowsutf-8localecodepagesmbcs

Why isn't UTF-8 allowed as the "ANSI" code page?


The Windows _setmbcp function allows any valid code page...

(except UTF-7 and UTF-8, which are not supported)

OK, not supporting UTF-7 makes sense: Characters have non-unique representations and that introduces complexity and security risks.

But why not UTF-8?

As I understand it, the "ANSI" versions of the Windows API functions convert their arguments to UTF-16, call the equivalent "W" function, and convert any strings in the output to "ANSI". This is what I've been doing manually. So why can't Windows do it for me?


Solution

  • The "ANSI" codepage is basically legacy: Windows 9X era. All modern software should be Unicode (that is, UTF-16) based anyway.

    Basically, when the Ansi code page stuff was originally designed, UTF-8 wasn't even invented and so support for multi-byte encodings was rather haphazard (i.e. most Ansi code pages are single byte, with the exception of some East Asian code pages which are one-or-two byte). Adding support for "proper" multi-byte encodings was probably deemed not worth the effort when all new development should be done in UTF-16 anyway.