perlutf-8perldocperl-pod

Why does perldoc evaluate 'Münster' as 'Muenster'


I have a simple POD text file:

$ cat test.pod 
=encoding UTF-8

Münster

It is encoded in UTF-8, as per this literal hex dump of the file:

00000000  3d 65 6e 63 6f 64 69 6e  67 20 55 54 46 2d 38 0a  |=encoding UTF-8.|
00000010  0a 4d c3 bc 6e 73 74 65  72 0a                    |.M..nster.|
0000001a

The "ü" is being encoded as the two bytes C3 and BC.

But when I run perldoc on the file it is turning my lovely formatted UTF-8 characters into ASCII.

What's more, it is correctly handling the German language convention of representing "ü" as "ue".

$ perldoc test.pod | cat
TEST(1)               User Contributed Perl Documentation              TEST(1)

Muenster

perl v5.16.3                      2014-06-10                           TEST(1)

Why is it doing this?

Is there an additional declaration I can put into my file to stop it from happening?


After additional investigation with App::perlbrew I've found the difference comes from having a particular version of Pod::Perldoc.

perl-5.10.1    3.14_04    Muenster
perl-5.12.5    3.15_02    Muenster
perl-5.14.4    3.15_04    Muenster
perl-5.16.2    3.17       Münster
perl-5.16.3    3.19       Muenster
perl-5.16.3    3.17       Münster
perl-5.17.3    3.17       Münster
perl-5.18.0    3.19       Muenster
perl-5.18.1    3.23       Münster

However I would still like, if possible, a way to make Pod::Perldoc 3.14, 3.15, and 3.19 behave "correctly".


Solution

  • Found this RT ticket http://rt.cpan.org/Public/Bug/Display.html?id=39000

    This "bug" seems to be introduced with Perl 5.10 and perhaps this was solved in later versions.

    Also see: How can I use Unicode characters in Perl POD-derived man pages? and incorrect behaviour of perldoc with UTF-8 texts.

    You should add the latest available version of Pod::Perldoc as a dependency.