perlsubroutine-prototypes

Can a Perl subroutine have two different prototypes to allow for an optional block argument?


Caveats associated with prototypes accepted and notwithstanding, can the two below contrived subs exist within the same package, i.e. to provide an optional block parameter like sort does?

sub myprint {
   for (@_) {
       print "$_\n";
   }
}
sub myprint (&@) {
   my $block = shift;
   for (@_) {
       print $block->() . "\n";
   }
}

The intent is provide a similar calling convention as sort, e.g. to allow execution of:

my @x = qw(foo bar baz);
print_list @x;

# foo
# bar
# baz
 

...and:

my @y = ( {a=>'foo'}, {a=>'bar'}, {a=>'baz'} );
print_list { $_->{a} } @y;

# foo
# bar
# baz

I get redefine and/or prototype mismatch warnings if I try (which is reasonable).

I suppose I can do:

sub myprint {
   my $block = undef;
   $block = shift if @_ && ref($_[0]) eq 'CODE';
   for (@_) {
       print (defined($block) ? $block->() : $_) . "\n";
   }
}

...but the &@ prototype provides the syntactic sugar; removing requires:

my @y = ( {a=>'foo'}, {a=>'bar'}, {a=>'baz'} );
print_list sub { $_->{a} }, @y;                  # note the extra sub and comma

(I've tried ;&@, to no avail -- it still yields Type of arg 1 to main::myprint must be block or sub {} (not private array).)


Solution

  • Yes.

    Unfortunately it's a bit of a pain. You need to use the keyword API introduced in Perl 5.14. This means you need to implement it (and the custom parsing for it) in C and link it to Perl with XS.

    Fortunately DOY wrote a great wrapper for the Perl keyword API, allowing you to implement keywords in pure Perl. No C, no XS! It's called Parse::Keyword.

    Unfortunately this has major bugs dealing with closed over variables.

    Fortunately they can be worked around using PadWalker.

    Anyway, here's an example:

    use v5.14;
    
    BEGIN {
      package My::Print;
      use Exporter::Shiny qw( myprint );
      use Parse::Keyword { myprint => \&_parse_myprint };
      use PadWalker;
      
      # Here's the actual implementation of the myprint function.
      # When the caller includes a block, this will be the first
      # parameter. When they don't, we'll pass an explicit undef
      # in as the first parameter, to make sure it's nice and
      # unambiguous. This helps us distinguish between these two
      # cases:
      #
      #    myprint { BLOCK } @list_of_coderefs;
      #    myprint @list_of_coderefs;
      #
      sub myprint {
        my $block = shift;
        say for defined($block) ? map($block->($_), @_) : @_;
      }
      
      # This is a function to handle custom parsing for
      # myprint.
      #
      sub _parse_myprint {
    
        # There might be whitespace after the myprint
        # keyword, so read and discard that.
        #
        lex_read_space;
        
        # This variable will be undef if there is no
        # block, but we'll put a coderef in it if there
        # is a block.
        #
        my $block = undef;
        
        # If the next character is an opening brace...
        #
        if (lex_peek eq '{') {
          
          # ... then ask Parse::Keyword to parse a block.
          # (This includes parsing the opening and closing
          # braces.) parse_block will return a coderef,
          # which we will need to fix up (see later).
          #
          $block = _fixup(parse_block);
          
          # The closing brace may be followed by whitespace.
          #
          lex_read_space;
        }
        
        # After the optional block, there will be a list
        # of things. Parse that. parse_listexpr returns
        # a coderef, which when called will return the
        # actual list. Again, this needs a fix up.
        #
        my $listexpr = _fixup(parse_listexpr);
        
        # This is the stuff that we need to return for
        # Parse::Keyword.
        #
        return (
          
          # All of the above stuff happens at compile-time!
          # The following coderef gets called at run-time,
          # and gets called in list context. Whatever stuff
          # it returns will then get passed to the real
          # `myprint` function as @_.
          #
          sub { $block, $listexpr->() },
          
          # This false value is a signal to Parse::Keyword
          # to say that myprint is an expression, not a
          # full statement. If it was a full statement, then
          # it wouldn't need a semicolon at the end. (Just
          # like you don't need a semicolon after a `foreach`
          # block.)
          #
          !!0,
        );
      }
      
      # This is a workaround for a big bug in Parse::Keyword!
      # The coderefs it returns get bound to lexical
      # variables at compile-time. However, we need access
      # to the variables at run-time.
      #
      sub _fixup {
        
        # This is the coderef generated by Parse::Keyword.
        #
        my $coderef = shift;
        
        # Find out what variables it closed over. If it didn't
        # close over any variables, then it's fine as it is,
        # and we don't need to fix it.
        #
        my $closed_over = PadWalker::closed_over($coderef);
        return $coderef unless keys %$closed_over;
        
        # Otherwise we need to return a new coderef that
        # grabs its caller's lexical variables at run-time,
        # pumps them into the original coderef, and then
        # calls the original coderef.
        #
        return sub {
          my $caller_pad = PadWalker::peek_my(2);
          my %vars = map +($_ => $caller_pad->{$_}), keys %$closed_over;
          PadWalker::set_closed_over($coderef, \%vars);
          goto $coderef;
        };
      }
    };
    
    use My::Print qw( myprint );
    
    my $start = "[";
    my $end   = "]";
    
    myprint "a", "b", "c";
    
    myprint { $start . $_ . $end } "a", "b", "c";
    

    This generates the following output:

    a
    b
    c
    [a]
    [b]
    [c]