perlpostgresqlencodingutf-8plperl

Why does this postgres stored procedure want to `use utf8`?


I have come across a peculiarity in a plperl stored procedure on Postgres 9.2 with Perl 5.12.4.

The curious behavior can be reproduced using this "broken" SP:

CREATE FUNCTION foo(VARCHAR) RETURNS VARCHAR AS $$
    my ( $re ) = @_;
    $re = ''.qr/\b($re)\b/i;
    return $re;
$$ LANGUAGE plperl;

When executed:

# select foo('foo');
ERROR:  Unable to load utf8.pm into plperl at line 3.
BEGIN failed--compilation aborted.
CONTEXT:  PL/Perl function "foo"

However, if I move the qr// operation into an eval, it works:

CREATE OR REPLACE FUNCTION bar(VARCHAR) RETURNS VARCHAR AS $$
    my ( $re ) = @_;
    eval "\$re = ''.qr/\\b($re)\\b/i;";
    return $re;
$$ LANGUAGE plperl;

Result:

# select bar('foo');
       bar       
-----------------
 (?^i:\b(foo)\b)
(1 row)
  1. Why does the eval bypass the automatic use utf8?

  2. Why is use utf8 even required in the first place? My code is not in UTF8, which is said to be the only time one should use utf8.

    If anything, I might expect the eval version to break without use utf8, in the case where the input to the script contained non-ASCII values. (Further testing shows that passing non-ASCII values to bar() does indeed cause the eval to fail with the same error)


Note that many Postgres installations automatically load 'utf8' on startup of the perl interpreter. This is the default in Debian at least, as demonstrated by executing DO 'elog(WARNING, join ", ", sort keys %INC)' language plperl;:

WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, strict.pm, unicore/Heavy.pl, unicore/To/Fold.pl, unicore/lib/Perl/SpacePer.pl, utf8.pm, utf8_heavy.pl, vars.pm, warnings.pm, warnings/register.pm
CONTEXT: PL/Perl anonymous code block
DO

But not so on the machine demonstrating the odd behavior:

WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, overloading.pm, strict.pm, vars.pm, warnings.pm, warnings/register.pm
CONTEXT: PL/Perl anonymous code block
DO

This question is not about how to get my target machine to load utf8 automatically; I know how to do that. I'm curious why it seems to be necessary in the first place.


Solution

  • In the verison that's failing, you're executing

    $re = ''.qr/\b($re)\b/i
    

    In the version that's succeeding, you're executing

    $re = ''.qr/\b(foo)\b/i
    

    Sounds like qr// needs utf8.pm when the pattern was compiled as a Unicode pattern (whatever that means), but the latter isn't compiled as a Unicode pattern.


    The failure to load utf8.pm is due to the limitations imposed by the Safe compartment created by plperl.

    The fix is to load the module outside the Safe compartment.

    The workaround is to use the more efficient

    $re = '(?^u:\\b(?i:'.$re.')\\b)';