perlarrayref

Can perl automaticly delete a scalar and replace with arrayref?


This is a self contained example of a hard to find bug I tracked down. A module call it 'A' has a function to flatten its data structure that is mostly strings that has to do with hiding data in a html form. Along with the strings it occasionally unintentionally flattened an arrayref and also hid it in the form. it made it round trip and the array showed up as a string 'array' => 'ARRAY(0x55abf15aa790)' when the function gen_array that generates the arrayref runs again to recreate the arrayref it cannot re-initialize the scalar as an arrayref with @{ $self->{arrayref} } = (), if its already a string. why is that? From perldoc perldata:

Scalars aren’t necessarily one thing or another. There’s no place to declare a scalar variable to be of type "string", type "number", type "reference", or anything else. Because of the automatic conversion of scalars, operations that return scalars don’t need to care (and in fact, cannot care) whether their caller is looking for a string, a number, or a reference. Perl is a contextually polymorphic language whose scalars can be strings, numbers, or references (which includes objects).

So it would seem @{ $self->{arrayref} } = () should delete whatever $self->{arrayref} was and initialize an empty array ref? Why is that? Did this work in prior versions of perl? I have two solutions

1) just delete the arrayref prior to every reset @{ $self->{arrayref} } = ();

2) use Scalar::Util to filter out anything that is not a string from the hash, when it flattens it.

I could easily do both just to be safe, but am really wondering why this behavior exists to understand Perl better.

    #!/usr/bin/env perl

use strict;
no strict 'refs';
use warnings;
use diagnostics;


BEGIN {

    package A;

    sub new {
        my ( $classname, @arguments ) = @_;
        my $self = {@arguments};
        bless $self, $classname;
        return $self;
    }

    sub gen_array {

        my ( $self, $max ) = @_;
        if ( !$max ) { $max = 10; }


#  cannot re-initialize the scalar as an arrayref with @{ $self->{arrayref} } = ()
# why is that?
#         From perldoc perldata:
#         "Scalars aren’t necessarily one thing or another.
#         There’s no place to declare a scalar variable to be of type "string", type "number",
#         type "reference", or anything else. Because of the automatic conversion of scalars,
#         operations that return scalars don’t need to care (and in fact, cannot care) whether
#         their caller is looking for a string, a number, or a reference. Perl is a contextually
#         polymorphic language whose scalars can be strings, numbers, or references (which includes objects).
# So it would seem @{ $self->{arrayref} } = () should delete whatever $self->{arrayref} and initialize an empty array ref?
# Why is this not working? Did this work in prior versions of Perl?
        #
        # so in that case why can't perl convert a string to an arrayref?

        ########################################
        # Soloution 1
        #delete $self->{arrayref}; 
        ########################################
        @{ $self->{arrayref} } = ();

        foreach my $ref ( 1 .. $max ) {
            push( @{ $self->{arrayref} }, $ref );
        }
    }

    sub flatten {    # data structure gets flattened woops
        my $self = shift;
        my $rv;
        for ( sort keys %$self ) {
            $self->{$_} = qq|$self->{$_}|;
        }
    }
    ##################################################################
    sub flatten_better {    # possible soloution #2 checks for strings
        use Scalar::Util qw(reftype);
        my $self = shift;
        my $rv;
        for ( sort keys %$self ) {
            my $type = reftype( $self->{$_} );
            if ( defined $type ) {
                print qq|skipping  '$_' is a '$type' \n|;
                next;
            }
            else {
                print "reftype '$_' is string\n";
                $self->{$_} = qq|$self->{$_}|;
            }
        }
    }
    ######################################################################
    sub dump_self {
        use Data::Dumper;
        my $self = shift;
        my $reftype = ref $self->{arrayref};
        print "ref says object->{arrayref} is a '$reftype'\n";

        $reftype = reftype $self->{arrayref};
        print "reftype says object->{arrayref} is a '$reftype'\n";

        print Dumper $self;
    }

    1;
}

my $reftype;

my $object = A->new();
$object->{name} = "object";
$object->gen_array();



print "object\n";
foreach my $num ( @{ $object->{arrayref} } ) {
    print "num:'$num'\n";
}

$object->dump_self();
print "-" x 40;
print "\n";

#Here arrayref gets flattened:
$object->flatten(); #has bug
#this only flattens strings:
#$object->flatten_better();


print "test 1:\n";
$object->dump_self();
if ( @{ $object->{arrayref} } ) {
    foreach my $num ( @{ $object->{arrayref} } ) {
        print "num:'$num'\n";
    }
}
else { print "fail; expected\n" }
print "-" x 40;
print "\n";

$object->gen_array();    # why does this fail?

print "test 2:\n";
$object->dump_self();
if ( @{ $object->{arrayref} } ) {
    foreach my $num ( @{ $object->{arrayref} } ) {
        print "num:'$num'\n";
    }
}
else { print "fail; why does this fail?\n" }
print "-" x 40;
print "\n";

Solution

  • An array reference is a scalar, just like all references are scalars. A scalar variable holds a scalar and doesn't care what sort of scalar it is:

    use v5.10;
    use strict;
    use warnings;
    
    my $s = 123;
    say $s;
    
    $s = [qw(1 b c)];
    say "@$s";
    
    $s = sub { say "Hello" };
    $s->();
    

    All of these work without warnings because there's nothing to warn about. If you want to put something different in a variable, you can. This is unlike many other languages that want to know exactly what you are going to put there and will enforce that. This is a fundamental design decision for Perl.

    There's also the idea of "autovivification". If the value in the scalar variable is undef (which is really the absence of value) and you use the variable as if it is a reference, then Perl makes the value the reference type that you want:

    use v5.10;
    use strict;
    use warnings;
    
    my $s;  # undef
    $s->[0] = 'abc';
    
    say ref $s;   # ARRAY
    say $s->[0];
    

    However, once the scalar variable has a defined value, you can't treat it as a different reference type:

    use v5.10;
    use strict;
    use warnings;
    
    my $s;  # undef
    $s->[0] = 'abc';  # auto vivified array reference, then assignment
    
    say ref $s;   # ARRAY
    $s->{some_key} = '124'; # ERROR: Not a HASH reference at ...
    

    Now, a reference is just that: it connects to data the Perl is managing. It doesn't care about packages or scopes.

    use v5.10;
    use strict;
    use warnings;
    
    my $outer;
    
    {
    my $inner = [ qw(a b c) ];
    say "inner: ", $inner;
    $outer = $inner;
    }
    say "outer: ", $outer;
    
    $main::foo = $outer;
    
    say "package: ", $main::foo;
    

    These are all the same data, and any of these reference can change the data:

    inner: ARRAY(0x15000aca8)
    outer: ARRAY(0x15000aca8)
    package: ARRAY(0x15000aca8)
    

    Now, with all of that, assigning a new list to an array reference does not "delete" the reference. It replaces the list:

    use v5.10;
    use strict;
    use warnings;
    
    my $s = [qw(a b c)];
    say "$s -> @$s";
    
    @$s = ();
    say "$s -> @$s";
    
    @$s = qw(1 2 3);
    say "$s -> @$s";
    

    The reference is the same each time enough though the value it contains change:

    ARRAY(0x12300aca8) -> a b c
    ARRAY(0x12300aca8) ->
    ARRAY(0x12300aca8) -> 1 2 3
    

    If you want a new reference disconnected from anything else, use a new reference:

    $s = [];  # anonymous array constructor
    $s = $some_other_thing;
    

    However, if that $some_other_thing is a reference, you still might be sharing deep references. Consider this:

    my %hash = ( abc => [qw(1 2 3)] );
    my $s = $hash{'abc'};
    my $t = $hash{'abc'}; # $s and $t share the reference
    

    To get around this, you can make a deep copy to get new references at all levels:

    use v5.10;
    use strict;
    use warnings;
    
    use Storable qw(dclone);
    
    my %hash = ( abc => [qw(1 2 3)] );
    my $s = dclone($hash{'abc'}); # not shared with %hash
    my $t = dclone($hash{'abc'}); # not shared with %hash or $s
    
    say "hash -> $hash{abc}";
    say "s -> $s";
    say "t -> $t";
    

    There are all disconnected references that will not affect each other:

    hash -> ARRAY(0x14400aca8)
    s -> ARRAY(0x144023248)
    t -> ARRAY(0x1440233c8)
    

    Knowing all that, be mindful of what you are trying to do and choose the right thing for that.