c++dllunicodelptstrlpwstr

LPTSTR contains only one letter


I'm creating a DLL for a application. The application calls the DLL and receives a string from 8 to 50 in length.

The problem I'm having is that only the first letter of any message the application receives is shown.

Below is the GetMethodVersion function.

#include "stdafx.h"
STDAPI_(void) GetMethodVersion(LPTSTR out_strMethodVersion, int in_intSize)
{
   if ((int)staticMethodVersion.length() > in_intSize)
       return;
   _tcscpy_s(out_strMethodVersion, 12, _T("Test")); 
   //staticMethodVersion should be insted of _T("Test")
}

The project settings are set to Unicode. I belive after some research that there is a problem with Unicode format and how it functions. Thanks for any help you can give.


Solution

  • You wrote in your question that the project settings are Unicode: is this true for both the DLL and the calling EXE? Make sure that they both match.

    In Unicode builds, the ugly TCHAR macros become:

    LPTSTR      --> wchar_t*
    _tcscpy_s   --> wcscpy_s
    _T("Test")  --> L"Test"
    

    So you have:

    STDAPI_(void) GetMethodVersion(wchar_t* out_strMethodVersion, 
                                   int in_intSize)
    {
        ...
        wcscpy_s(out_strMethodVersion, 12, L"Test");
    }
    

    Are you sure the "magic number" 12 is correct? Is the destination string buffer pointed to by out_strMethodVersion of size at least 12 wchar_ts (including the terminating NUL)?

    Then, have a look at the call site (which you haven't showed).

    How do you print the returned string? Maybe you are using an ANSI char function, so the returned string is misinterpreted as a char* ANSI string, and so the first 0x00 byte of the Unicode UTF-16 string is misinterpreted as a NUL-terminator at the call site, and the string gets truncated at the first character when printed?

     Text:             T       e       s       t      NUL
     UTF-16 bytes:   54 00   65 00   73 00   74 00   00 00
         (hex)          **<--+
                             |
                     First 00 byte misinterpreted as 
                   NUL terminator in char* ANSI string,
             so only 'T' (the first character) gets printed.
    

    EDIT

    The fact that you clarified in the comments that:

    I switched the DLL to ANSI, the EXE apparently was that as well, though the exe was documented as Unicode.

    makes me think that the EXE assumes the UTF-8 Unicode encoding.

    Just as in ANSI strings, a 0x00 byte in UTF-8 is a string NUL terminator, so the previous analysis of UTF-16 0x00 byte (in a wchar_t) misinterpreted as string NUL terminator applies.

    Note that pure ASCII is a proper subset of UTF-8: so your code may work if you just use pure ASCII characters (like in "Test") and pass them to the EXE.

    However, if the EXE is documented to be using Unicode UTF-8, you may want to Do The Right Thing and return a UTF-8 string from the DLL.

    The string is returned via char* (as for ANSI strings), but it's important that you make sure that UTF-8 is the encoding used by the DLL to return that string, to avoid subtle bugs in the future.

    While the general terminology used in Windows APIs and Visual Studio is "Unicode", it actually means the UTF-16 Unicode encoding in those contexts.

    However, UTF-16 is not the only Unicode encoding available. For example, to exchange text on the Internet, the UTF-8 encoding is widely used. In your case, it sounds like your EXE is expecting a Unicode UTF-8 string.