c++c++builderunicode-stringansistring

AnsiString as default for type string in Embarcadero C++ Builder?


I have inherited an old Borland C++ Builder application which I now must migrate to a new development tool. The suggested way to go is with Embarcadero C++ Builder, and from my initial tests it seems like a rather smooth transition.

I do however have one problem to which I'm hoping there is a simple solution:

The application parses a large amount of text files. These files are all ANSI based, and that will never change, so it is ANSI in and ANSI out. The main problem I have is that with Embarcadero C++, the type string is now a UnicodeString instead of an AnsiString (as it was in Borland C++ Builder).

Using Unicode in this application is not an option - the files it work with are ANSI formatted. Modifying the code to use AnsiString (and similar) is doable, but i'd rather not since it uses a lot of TStringList (and similar) constructs.

So my question is: Is there a setting or compiler option or something that I can use to tell Embarcadero to use System.AnsiString as definition for string instead of System.UnicodeString?

This is probably a long-shot, but the RAD Studio XE (which is the older version that I have borrowed to make a few tests) documentation says "by default, the type string is now a Unicode string", which implies that this can be changed. That is however rephrased in the documentation for the current version (XE8), so...


Solution

  • I have inherited an old Borland C++ Builder application which I now must migrate to a new development tool. The suggested way to go is with Embarcadero C++ Builder

    Yes. They are actually the same product. Borland created a child company named CodeGear to manage its developer tools (Delphi, C++Builder, etc), and then Embarcadero later bought CodeGear.

    The main problem I have is that with Embarcadero C++, the type string is now a UnicodeString instead of an AnsiString (as it was in Borland C++ Builder).

    string (lowercase s) refers to the STL's std::string class, which is still char-based. You are thinking of C++Builder's System::String alias, which does now map to System::UnicodeString instead of System::AnsiString (that change was made in C++Builder 2009, when UnicodeString was introduced). However, AnsiString still exists and can be used directly.

    Using Unicode in this application is not an option - the files it work with are ANSI formatted.

    Then don't use UnicodeString to process them. Continue using AnsiString instead.

    Modifying the code to use AnsiString (and similar) is doable, but i'd rather not since it uses a lot of TStringList (and similar) constructs.

    That, on the other hand, would be a problem, yes. Most of the RTL only supports UnicodeString now. So code using TStringList will have to be re-written, such as by using TList<AnsiString> or std::vector<AnsiString> instead (unless the code is utilizing the TStringList::(Comma|Delimited)Text properties, in which case you have a bigger re-write). However, for AnsiString parsing code, many of the older AnsiString-based RTL functions were moved to a separate System.AnsiStrings unit, so you can add #include <System.AnsiStrings.hpp> to your code to reach them.

    So my question is: Is there a setting or compiler option or something that I can use to tell Embarcadero to use System.AnsiString as definition for string instead of System.UnicodeString?

    No. And if you think about it, that would be a major undertaking for them to implement. Multiple copies of the RTL/VCL/FMX frameworks, 2 for each supported OS platform. And a lot of internal code would have to be IFDEF'ed to handle differences between Ansi/Unicode processing logic. So not really feasible or cost-effective for them to do (and much too late at this point, especially considering that AnsiString is not supported on mobile OS platforms - though there is a 3rd party patch available to re-enable it).

    This is probably a long-shot, but the RAD Studio XE (which is the older version that I have borrowed to make a few tests) documentation says "by default, the type string is now a Unicode string", which implies that this can be changed.

    No, it cannot by changed. The RTL/VCL/FMX frameworks are Unicode now. But that does not require that your code must be Unicode as well. Only in the spots where you need to directly interact with the RTL/VCL/FMX. The rest of your code can continue using AnsiString (or even std::string) as needed.