perl

Could someone please explain this Perl method?


Unfortunately, I have inherited a very large Perl codebase and I need to do some modifications to one of its features.

The program is reading a CSV file and does some cleanup on it to create a protobuf message. There is a key value pair foo:bar in a temp_params{} part of the input string.

I managed to trace where I think the temp_params are getting parsed. For now, please ignore dp, let's assume that dp will always be empty. The Key-value I'm expecting is part of tp, not dp.

sub _add_temp_and_dynamic_params {
    my ($self, $row) = @_;

    state $raw_to_full = {
        tp => 'temp_params',
        dp => 'dynamic_params',
    };

    while ( my ( $raw, $full ) = each %$raw_to_full ) {
        my @params =
            map { { key => $_->[0], value => $_->[1] } }
            grep { @$_ == 2 && $_->[0] ne '' }
            map { [ split ":" ] }
            split /,/, ( delete $row->{ $raw } or next );
        $row->{ $full }->@* = @params if @params;
    }
}

After this, I want to call _my_new_subroutine with the output of that method above, and the goal of that method is to take that temp_params{foo:bar} so that foo:bar ends up as its own independent key-value pair

The problem is: I really have no idea what is going on in this method or how the output looks like. This looks extremely cryptic to me. I don't know how raw_to_full could contain any meaningful info, since it seems that it's just creating a hash where the key is tp and the value is the string 'temp_params'. I'm not even sure if my method will need to work with a hash or with a string. What does ( delete $row->{ $raw } or next ) even mean? or $row->{ $full }->@* = @params if @params;? I heard many times that Perl had the ill-fame of being a "write-only language", but I had no idea it would be this bad.

So far, what I understand is:

Method assigns the $self variable because perl methods always pass their own caller as an argument. Then $row is added. The $ indicates scalar, and this is operating on csv lines, so I assume that the input is a string.

Then, a local variable $raw_to_full is being declared, this is a hash where tp and dp are keys, and strings 'temp_params' and 'dynamic_params' are their respective values. (This also leads to another question: if this is a hash, why is this being indicated with a $ instead of a %?)

Then, 2 new local variables are created: raw and full, which are the result of "Spliting" the key and the values in each entry in the raw_to_full hash. For every iteration, a string is created where the word 'key' is actually declared as key itself, and the value of key is the 'raw' value whereas 'full' is the value (not so sure about this.

Then a filter is done to only include rows with key-value, where the value is not empty.

Then, a split is created at where ':' is indicated. Lastly, another split is done to separate I assume the different kv pairs inside of each "params" string. After that, I'm completely lost as to what could be happening here.


Solution

  • Use a dumping tool to look at your data structure. You'll find that it takes

    $row = {
       tp => "foo:bar,abc:def",
    };
    

    and changes it into

    $row = {
       temp_params => [
          { key => "foo", value => "bar" },
          { key => "abc", value => "def" },
       ],
    ];
    

    This format makes it hard to access params by key, but it allows multiple params with the same key.

    To find the values for a key, you will need to iterate over the array.

    my @values_for_foo =
       map { $_->{ value } }
          grep { $_->{ key } eq "foo" }
             $row->{ temp_params }->@*;
    

    That said, you only keep the last value for any given key. (So why use this structure?!?) To get the only value for a key (or undef if not found),

    my ( $value_for_foo ) =
       map { $_->{ value } }
          grep { $_->{ key } eq "foo" }
             $row->{ temp_params }->@*;
    

    Specific questions asked:

    Please limit your Question to one problem in the future.


    I heard many times that Perl had the ill-fame of being a "write-only language", but I had no idea it would be this bad.

    Except for the "hidden" or next and the improvement I suggested above, this is actually extremely readable. Your problem is that you don't even know the basics. Not knowing the language doesn't mean not readable.

    Since you're comparing to Python,

    Perl:

    my @params =
       map { { key => $_->[0], value => $_->[1] } }
          grep { @$_ == 2 && $_->[0] ne '' }
             map { [ split /:/ ] }
                split /,/,
                   $row->{ $raw };
    

    Python:

    tmp = row[ "raw" ].split( "," )
    tmp = [ x.split( ":" ) for x in tmp ]
    tmp = [ x for x in tmp if len( x ) == 2 and x[ 1 ] != "" ]
    params = [ { "key": x[ 0 ], "value": x[ 1 ] } for x in tmp ]
    

    yikes! So, even though every language strives to add functional programming features,[1] we're forced to abandon the functional programming approach and use loops.

    params = []
    for x in row[ raw ].split( "," ):
       tmp = x.split( ":" )
       if len( tmp ) == 2 and tmp[ 1 ] != "":
          params.append( { "key": tmp[ 0 ], "value": tmp[ 1 ] } )
    

    ok, despite the inflexibility of Python, we managed to write something readable. It just took a lot more work. Just to get back what looks almost identical to the Perl program (in the opposite order).

    Note that you have the option of using this style in Perl too.

    my @params;
    for my $x ( split( /,/, $row->{ $raw } ) ) {
       my @tmp = split( /:/, $x );
       if ( @tmp == 2 && $tmp[ 1 ] ne "" ) {
          push @params, { "key": tmp[ 0 ], "value": tmp[ 1 ] };
       }
    }
    

    1. To name some, Perl, C#, C++, JS, Lua, Raku and especially Python (function pointers, list comprehensions). This is basically every language I have some knowledge of except C.

      For example, the following is how I'd write it in C#, which is virtually identical to the Perl version:

      row[ raw ]
         .split( "," )
            .Select( _ => _.split( ":" ) )
               .Where( _ => _.Count == 2 && _[ 1 ] != "" )
                  .Select( _ =>
                     new Dictionary<string, string>()
                        { "key", _[ 0 ], "value", _[ 1 ] }
                  );