windowsperlencodingstrawberry-perlcp1252

How to force codeset cp1252 for output file in perl >=5.18 within Windows 10?


I need to make shure that the output file i create with my perl script has the codeset cp1252 and not UTF-8 because it will be used within a UNIX SQLplus framework which handles german "umlauts" not correctly when it insert that values into the database columns (I use strawberry perl v5.18 within Windows 10 and i cannot set NLS_LANG or chcp within the UNIX SQL environment).

With this little test script i can reproduce that the output file "testfile1.txt" is allways in UTF-8 but "testfile2.txt" is CP1252 as expected. How can i force the output for "testfile1.txt" to be also CP1252 even if there are no "special" chars within the text ?

#!/usr/bin/env perl -w
use strict;
use Encode;

# the result file under Windows 10 will have UTF-8 codeset
open(OUT,'> testfile1.txt');    
binmode(OUT,"encoding(cp-1252)");
print OUT encode('cp-1252',"this is a test");
close(OUT);

# the result file under Windows 10 will have Windows-cp1252 codeset
open(OUT,'> testfile2.txt');    
binmode(OUT,"encoding(cp-1252)");
print OUT encode('cp-1252',"this is a test with german umlauts <ÄäÜüÖöß>");
close(OUT);

Solution

  • I think your question is based on a misunderstanding. testfile1.txt contains the text this is a test. These characters have the same encoding in ASCII, Latin-1, UTF-8, and CP-1252. testfile1.txt is valid in all of these encodings simultaneously.


    To include literal Unicode characters in your source code like this:

    print OUT encode('cp-1252',"this is a test with german umlauts <ÄäÜüÖöß>");
    

    you need

    use utf8;
    

    at the top.

    Also, don't combine encoding layers on filehandles with explicit encode() calls. Either set an encoding layer and print Unicode text to it, or use binmode(OUT) and print raw bytes (as returned from encode()) to it.


    By the way, you shouldn't use -w anymore. It's been supplanted by the

    use warnings;
    

    pragma.

    Similarly, bareword filehandles and two-argument open are pre-5.6 style code and shouldn't be used in code written after 2000. (perl 5.005 and earlier didn't support Unicode/encodings anyway.)

    A fixed version of your code looks like this:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    use utf8;
    
    {
        open(my $out, '>:encoding(cp-1252)', 'testfile1.txt') or die "$0: testfile1.txt: $!\n";    
        print $out "this is a test\n";
        close($out);
    }
    
    {
        open(my $out, '>encoding(cp-1252)', 'testfile2.txt') or die "$0: testfile2.txt: $!\n";    
        print $out "this is a test with german umlauts <ÄäÜüÖöß>\n";
        close($out);
    }