delphiencodingascii

Delphi tstringlist encoding default


I have noticed recently (maybe a change in recent Delphi) that if I load an ASCII format txt file into a tstringlist, edit a line with file.lines[10]:='blah', and then save it again the file is now encoded as UTF-8 format. I want it to always use ASCII.

I see you can pass a Tencoding parameter to the TStringlist.SaveAsFile but I have hundreds of these load/edit/saves throughout my program handling various edits.

Is there a way to set a one time global setting that then makes all TStringlist calls use ASCII only?

The main reason for this is that I am creating batch files with these edits and when they are UTF-8 format the Windows Command Line cmd.exe cannot execute them correctly, ie

@echo off

comes back as

'@echo' is not recognized as an internal or external command, operable program or batch file.

Doing some more tests, if I check the Beta: Use unicode UTF-8 for worldwide language support checkbox under the Windows regional advanced settings the batch file display issue is gone (and TStringlist saving as UTF-8 is not an issue. But I cannot ask all users to enable that checkbox to get correct display. If there was a simple global "always use ASCII for tstringlist" that would fix my issue as I never need UTF/unicode support for Tstringlists. At least until that Beta setting becomes standard in Windows.

And to add even more confusion (at least to me), if I explicitly tell tstringlist the encoding tmp.savetofile('blah.bat', TEncoding.ASCII); it still shows as UTF-8 encoding in Notepad++ once the tstringlist is saved.


Solution

  • Is there a way to set a one time global setting that then makes all TStringlist calls use ASCII only?

    Unfortunately, there is no such global setting. Encoding is handled on a per-TStringList, per-stream/file basis.

    When you load a TStringList from a stream/file, if no encoding is specified in the Encoding parameter then an encoding is auto-detected from the text data (ie, a BOM is looked for), and if no encoding is detected (ie, no BOM is present) then the TStringList.DefaultEncoding property is used. DefaultEncoding defaults to TEncoding.Default, which on Windows is ANSI (ie the user's locale) not UTF-8 (unless you set the locale to UTF-8).

    The actual encoding used to load the data is stored in the TStringList.Encoding property.

    When you save a TStringList to a stream/file, and do not specify an encoding in the Encoding parameter, then the TStringList.Encoding property is used if not nil (so the new stream/file can match the previously loaded stream/file), otherwise the TStringList.DefaultEncoding property is used.

    So, to do what you want, you can set the TStringList.DefaultEncoding property to TEncoding.ASCII on each TStringList object, thus loading and saving will use ASCII unless a different encoding is specified/detected per file.

    And to add even more confusion (at least to me), if I explicitly tell tstringlist the encoding tmp.savetofile('blah.bat', TEncoding.ASCII); it still shows as UTF-8 encoding in Notepad++ once the tstringlist is saved.

    ASCII is a subset of UTF-8, so a valid ASCII file will also be a valid UTF-8 file. But whether or not Notepad++ treats an ASCII file as UTF-8 depends on whether or not Notepad++ is configured to use UTF-8 as its default encoding.