c++windowswinapiunicode

Why there is a blank between each char I wrote to a file with WriteFile?


Here's my code:

WCHAR msg[] = L"ReplaceFile:";
::WriteFile( hFile, msg, lstrlenW(msg) * sizeof(WCHAR), &nBytes, NULL );  

I created this file with OPEN_ALWAYS mode, and I'm going to write some const string to this file. The file display the "ReplaceFile" like this: R e p l a c e F i l e.

Can somebody tell me how to make it normal and why? Thanks in advance.


Solution

  • WCHAR is an alias for wchar_t, which is 2 bytes in size on Windows. Wide strings on Windows are encoded in UTF-16LE. In UTF-16, each element (called a codeunit) is 2-bytes (16 bits) in size, where Unicode codepoints U-0000 - U-FFFF take up one codeunit, and higher codepoints take up two codeunits.

    Your wide string consists of only ASCII characters, which are less than 0x0080 so they use no more than 7 bits each, leaving at least 9 bits set to 0. Thus, every other byte written to the file has a value of 0x00, which is not a displayable character, thus the extra spacing you are seeing.

    Your wide string L"ReplaceFile:" consists of the following bytes in UTF-16LE:

    0x52 0x00 // R
    0x65 0x00 // e
    0x70 0x00 // p
    0x6C 0x00 // l
    0x61 0x00 // a
    0x63 0x00 // c
    0x65 0x00 // e
    0x46 0x00 // F
    0x69 0x00 // i
    0x6C 0x00 // l
    0x65 0x00 // e
    0x3A 0x00 // :
    

    You should read the following article:

    The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

    With that said, UTF-16 is not the best choice for storing a string in a file. UTF-8 is more condensed than UTF-16 for most languages, and it is backwards compatible with ASCII. On Windows, you can use the WideCharToMultiByte() function (or similar function/library) to convert your wide string before writing it to the file:

    WCHAR msg[] = L"ReplaceFile:";
    int len = WideCharToMultiByte(CP_UTF8, 0, msg, lstrlenW(msg), NULL, 0, NULL, NULL);
    CHAR *converted = new CHAR[len];
    WideCharToMultiByte(CP_UTF8, 0, msg, lstrlenW(msg), converted, len, NULL, NULL);
    ::WriteFile( hFile, converted, len * sizeof(CHAR), &nBytes, NULL );  
    delete [] converted;