I'm working on a C++ program that receives user's keyboard input on Windows 10. Users can input English and East Asian languages like Chinese using Input Methods and US keyboard.
The program gets WM_IME_CHAR messages when a user types Chinese via a Microsoft Input Method. The wParam values it receives are very different on two Windows 10 computers, with same keyboard input and same Input Method selection by a user.
To make a test, a user enters a same Chinese character on a keyboard on the two Windows 10 computers. On one computer, the wParam gives a character with GBK or GB2312 encoding, which I know how to handle. On the other computer the wParam seems always a small number 63, which I don't know what it is. I can't find any differences between the two computers. As far as I know all other software works the same on the two computers. For example, NotePad works the same on the two computers.
My goal is not to find out the differences of computer settings. My goal is to fix my program so that it receives the same correct input from both computers even if they may have different settings.
What could affect the type of values in wParam of a WM_IME_CHAR message? How is it controlled via Win32 or Windows API C/C++ calls?
The program was built in Visual Studio 2019, targeting 32-bit (x86) platform. The particular IME used in the test was Chinese Simplified - Microsoft Pinyin.
I investigated and tried various Windows API calls on multiple Windows 10 computers, and finally got a relatively complete answer with solution. I’m sharing what I have found.
When a user inputs Eastern Asian characters from keyboard via an IME, 3 factors can affect the values passed into your program via the wParam of WM_IME_CHAR message. The factors are:
RegisterClass[Ex]A()
or RegisterClass[Ex]W()
Among the factors, the software developers have control over b) and c). With correct coding, b) and c) offer enough control to make your program work regardless of how a) is set.
The PC setting mentioned above can be accessed via the Control Panel. It is labeled in two ways in Windows systems, as “language for non-Unicode programs” in the display pane, and as "system locale” in the editing dialog. I believe the setting not only specifies a language but also implies a unique character encoding. This concept is important to Chinese language as there are several encodings around. This setting is system-wide, not per-user. To change it you need an admin password. If you change the setting the Windows system immediately asks for a system restart. To locate the setting, you either open Control Panel, then Region / Administrative pane, or open Settings, then Region / Time & Language / Region / Additional date, time & regional settings / Region / Administrative pane.
When the “language for non-Unicode programs” is set to an Eastern Asian language with an encoding that covers user’s input, and if your program registers the window class by RegisterClassW()
or RegisterClassExW()
, the wParam of WM_IME_CHAR gets the Unicode of the input characters. If your program registers by RegisterClassA()
or RegisterClassExA()
, the wParam of WM_IME_CHAR gets the particular encoding selected by the “language for non-Unicode programs” setting.
When the “language for non-Unicode programs” is set to “English (United States)”, or any languages whose encoding does not cover the input, and a user enters Eastern Asian characters via an IME, the values in wParam of WM_IME_CHAR depend on how you register the class and how you handle the WM_IME_COMPOSITION message. If WM_IME_COMPOSITION is properly handled, and you register via RegisterClass[Ex]A()
, the wParam gets the encoding of the IME. if you register via RegisterClass[Ex]W()
the wParam gets the Unicode. Without proper handling of the WM_IME_COMPOSITION message, the wParam gets a “?” to indicate an error condition, regardless how you register the window class, by RegisterClass[Ex]A()
or RegisterClass[Ex]W()
.
To let your program work regardless of how the “language for non-Unicode programs” is set, you should handle WM_IME_COMPOSITION message properly. The simplest way to handle the message is to pass it to the DefWindowProcW()
function.
Based on my tests, if you process WM_IME_COMPOSITION correctly, several other things commonly mentioned in forum discussions or articles do not matter. It doesn't matter if you use GetMessageA()
or GetMessageW()
, nor versions of DispatchMessage()
or DefWindowProc()
(for all other messages). More importantly you do not need to set your project Unicode (no need to define UNICODE or _UNICODE macros). However if you set your project to Unicode, the DefWindowProc()
will point to DefWindowProcW()
which processes the WM_IME_COMPOSITION message properly.