perltrepan

Exact Perl location such as with B::Deparse


A long-standing problem in Perl is how to identify a location with finer granularity than a line number. (Follow the link for more information.) This question is about how to get that.

The most promising way to do this is to use the Perl opcode address that is under consideration and deparse the statements around that. And at the level of the a subroutine, B::Deparse will recreate Perl given a code reference. So ideal would be to modify B::Deparse to allow you to give a supplied op to start deparsing. Failing that, it could instead deparse the enclosing subroutine, displaying op-code addresses for each statement encountered. See the code below for an example of this.

B::Concise can show a op-code disassembly for a subroutine. In its disassembly output, it gives addresses, and those addresses it gives match those returned, say, by Devel::Callsite.

The problem is that after instrumenting B::Deparse as done below, the OP addresses it gives does not match those given by B::Concise or Devel::Callsite. Output given below shows this.

I can normalize addresses so that they refer to relative offsets rather than absolute addresses. However this is a lot of work, is gross, and I'm not even totally sure this will work, since Deparse may change code by "pessimizing" or, I guess, undoing optimization.

For concreteness, below is some code that shows the mismatch. Note that none of the addresses given by deparse is shown in the disassembly.

use B::Deparse;
use B::Concise qw(set_style);
sub foo() {
    my $x=1; $x+=1;
}

my $deparse = B::Deparse->new("-p", "-l", "-sC");

$body = $deparse->coderef2text(\&foo);
print($body, "\n");
my $walker = B::Concise::compile('-basic', 'foo', \&foo);
B::Concise::set_style_standard('debug');
B::Concise::walk_output(\my $buf);
$walker->();            # walks and renders into $buf;
print($buf);

package B::Deparse;

# Modified to show OP addresses
sub lineseq {
    my($self, $root, $cx, @ops) = @_;
    my($expr, @exprs);

    my $out_cop = $self->{'curcop'};
    my $out_seq = defined($out_cop) ? $out_cop->cop_seq : undef;
    my $limit_seq;
    if (defined $root) {
    $limit_seq = $out_seq;
    my $nseq;
    $nseq = $self->find_scope_st($root->sibling) if ${$root->sibling};
    $limit_seq = $nseq if !defined($limit_seq)
               or defined($nseq) && $nseq < $limit_seq;
    }
    $limit_seq = $self->{'limit_seq'}
    if defined($self->{'limit_seq'})
    && (!defined($limit_seq) || $self->{'limit_seq'} < $limit_seq);
    local $self->{'limit_seq'} = $limit_seq;

    my $fn = sub {
        my ($text, $i) = @_;
        my $op = $ops[$i];
        push @exprs, sprintf("# op: 0x%x\n%s ", $op, $text);
    };
    $self->walk_lineseq($root, \@ops, $fn);
    # $self->walk_lineseq($root, \@ops,
    #              sub { push @exprs, $_[0]} );

    my $sep = $cx ? '; ' : ";\n";
    my $body = join($sep, grep {length} @exprs);
    my $subs = "";
    if (defined $root && defined $limit_seq && !$self->{'in_format'}) {
    $subs = join "\n", $self->seq_subs($limit_seq);
    }
    return join($sep, grep {length} $body, $subs);
}

The output I get from running this is:

() {
    # op: 0x14a4b30
#line 4 "deparse-so.pl"
    (my $x = 1) ;
    # op: 0x14a4aa0
#line 4 "deparse-so.pl"
    ($x += 1) ;
}
main::foo:
UNOP (0xeb9978)
    op_next     0
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_LEAVESUB]
    op_type     175
    op_flags    4
    op_private  65  
    op_first    0xeab7a0
LISTOP (0xeab7a0)
    op_next     0xeb9978
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_LINESEQ]
    op_type     181
    op_flags    12
    op_private  0   
    op_first    0xeab7e8
    op_last     0xeb9a20
COP (0xeab7e8)
    op_next     0xeab890
    op_sibling  0xeab848
    op_ppaddr   PL_ppaddr[OP_NEXTSTATE]
    op_type     182
    op_flags    1
    op_private  0   0
BINOP (0xeab848)
    op_next     0xeb99c0
    op_sibling  0xeb99c0
    op_ppaddr   PL_ppaddr[OP_SASSIGN]
    op_type     37
    op_flags    69
    op_private  2   
    op_first    0xeab890
    op_last     0xeab8d0
SVOP (0xeab890)
    op_next     0xeab8d0
    op_sibling  0xeab8d0
    op_ppaddr   PL_ppaddr[OP_CONST]
    op_type     5
    op_flags    2
    op_private  0   
    op_sv       0xea3c40
OP (0xeab8d0)
    op_next     0xeab848
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_PADSV]
    op_type     9
    op_flags    178
    op_private  128 
COP (0xeb99c0)
    op_next     0xeab768
    op_sibling  0xeb9a20
    op_ppaddr   PL_ppaddr[OP_NEXTSTATE]
    op_type     182
    op_flags    1
    op_private  0   0
BINOP (0xeb9a20)
    op_next     0xeb9978
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_ADD]
    op_type     63
    op_flags    70
    op_private  2   
    op_first    0xeab768
    op_last     0xeb9a68
OP (0xeab768)
    op_next     0xeb9a68
    op_sibling  0xeb9a68
    op_ppaddr   PL_ppaddr[OP_PADSV]
    op_type     9
    op_flags    50
    op_private  0   
SVOP (0xeb9a68)
    op_next     0xeb9a20
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_CONST]
    op_type     5
    op_flags    2
    op_private  0   
    op_sv       0xea3c10
B::Concise::compile(CODE(0xea3c70))
UNOP (0xeb9978)
    op_next     0
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_LEAVESUB]
    op_type     175
    op_flags    4
    op_private  65  
    op_first    0xeab7a0
LISTOP (0xeab7a0)
    op_next     0xeb9978
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_LINESEQ]
    op_type     181
    op_flags    12
    op_private  0   
    op_first    0xeab7e8
    op_last     0xeb9a20
COP (0xeab7e8)
    op_next     0xeab890
    op_sibling  0xeab848
    op_ppaddr   PL_ppaddr[OP_NEXTSTATE]
    op_type     182
    op_flags    1
    op_private  0   0
BINOP (0xeab848)
    op_next     0xeb99c0
    op_sibling  0xeb99c0
    op_ppaddr   PL_ppaddr[OP_SASSIGN]
    op_type     37
    op_flags    69
    op_private  2   
    op_first    0xeab890
    op_last     0xeab8d0
SVOP (0xeab890)
    op_next     0xeab8d0
    op_sibling  0xeab8d0
    op_ppaddr   PL_ppaddr[OP_CONST]
    op_type     5
    op_flags    2
    op_private  0   
    op_sv       0xea3c40
OP (0xeab8d0)
    op_next     0xeab848
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_PADSV]
    op_type     9
    op_flags    178
    op_private  128 
COP (0xeb99c0)
    op_next     0xeab768
    op_sibling  0xeb9a20
    op_ppaddr   PL_ppaddr[OP_NEXTSTATE]
    op_type     182
    op_flags    1
    op_private  0   0
BINOP (0xeb9a20)
    op_next     0xeb9978
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_ADD]
    op_type     63
    op_flags    70
    op_private  2   
    op_first    0xeab768
    op_last     0xeb9a68
OP (0xeab768)
    op_next     0xeb9a68
    op_sibling  0xeb9a68
    op_ppaddr   PL_ppaddr[OP_PADSV]
    op_type     9
    op_flags    50
    op_private  0   
SVOP (0xeb9a68)
    op_next     0xeb9a20
    op_sibling  0
    op_ppaddr   PL_ppaddr[OP_CONST]
    op_type     5
    op_flags    2
    op_private  0   
    op_sv       0xea3c10

Finally, as way of encouraging people to help here, if this is solved, the solution will probably appear in by Perl debugger Devel::Trepan and allow you to reliably know exactly where you are when stopped inside the debugger.

Note: edited to make the question clearer.


Solution

  • ikegami's answer suggestion buried in the comments lead me to find the conceptual flaw I made in my first-proposed solution: inside B::Deparse a lexical array variable stores OPs and those are implicit pointers to the actual code OP structures. Using the undocumented $$ to get the underlying address that the scalar implicitly points to gives the correct address. So in my monkey-patched code of B::Deparse::lineseq, changing:

    push @exprs, sprintf("# op: 0x%x\n%s ", $op, $text);
    

    to:

    push @exprs, sprintf("# op: 0x%x\n%s ", $$op, $text);
                                            ^^
    

    gives me an address that I can use to match up results.

    Still, there's a bit of work still to get this usable, so if there are any other ways or suggestions, I'd love to hear them.

    Devel::Trepan release 0.70 now makes use in its deparse command of the above code suitably modified to be able to show which of multiple statements is about to be run.