regexstringperlpremature-optimization

perl string catenation and substitution in a single line?


I need to modify a perl variable containing a file path; it needs to begin and end with a forward slash (/) and have all instances of multiple forward slashes reduced to a single slash.

(This is because an existing process does not enforce a consistent configuration syntax, so there are hundreds of config files scattered everywhere that may or may not have slashes in the right places in file names and path names.)

Something like this:

foreach ( ($config->{'backup_path'},
           $config->{'work_path'},
           $config->{'output_path'}
         ) ) {
     $_ = "/" . $_ . "/";
     $_ =~ s/\/{2,}/\//g;
}

but this does not look optimal or particularly readable to me; I'd rather have a more elegant expression (if it ends up using an unusual regex I'll use a comment to make it clearer.)

Input & output examples

home/datamonster//c2counts becomes /home/datamonster/c2counts/

home/////teledyne/tmp/ becomes /home/teledyne/tmp/

and /var/backup/DOC/all_instruments/ will pass through unchanged


Solution

  • Well, just rewriting what you got:

    my @vars = qw ( backup_path work_path output_path );
    
    for ( @{$config}{@vars} ) {
       s,^/*,/,;  #prefix
       s,/*$,/,; #suffix
       s,/+,/,g; #double slashes anywhere else. 
    }
    

    I'd be cautious - optimising for magic regex is not an advantage in every situation, because they become quite quickly unreadable.

    The above uses the hash slice mechanism to select values out of a hash (reference in this case), and the fact that s/// implicitly operates on $_ anyway. And modifies the original var when it does.

    But it's also useful to know, if you're operating on patterns containing / it's helpful to switch delimiters, because that way you don't get the "leaning toothpicks" effect.

    s/\/{2,}/\//g can be written as:

    s,/+,/,g
    

    or

     s|/{2,}|/|g
    

    if you want to keep the numeric quantifier, as + is inherently 1 or more which works the same here, because it collapses a double into a single anyway, but it technically matches / (and replaces it with /) where the original pattern doesn't. But you wouldn't want to use the , if you have that in your pattern, for the same reason.

    However I think this does the trick;

    s,(?:^/*|\b\/*$|/+),/,g for @{$config}{qw ( backup_path work_path output_path )};
    

    This matches an alternation grouping, replacing either:

    with a single /.

    uses the hash slice mechanism as above, but without the intermediate 'vars'.

    (For some reason the second grouping doesn't work correctly without the word boundary \b zero width anchor - I think this is a backtracking issue, but I'm not entirely sure)

    For bonus points - you could probably select @vars using grep if your source data structure is appropriate:

    my @vars = grep { /_path$/ } keys %$config; 
    #etc. Or inline with:
    s,(?:^/*|\b\/*$|/+),/,g for @{$config}{grep { /_path$/ } keys %$config };
    

    Edit: Or as Borodin notes:

    s|(?:/|\A|\z)/*|/|
    

    Giving us:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    use Data::Dumper;
    
    my $config = {
       backup_path => "/fish/",
       work_path   => "narf//zoit",
       output_path => "/wibble",
       test_path => 'home/datamonster//c2counts',
       another_path => "/home/teledyne/tmp/",
       again_path => 'home/////teledyne/tmp/',
       this_path => '/var/backup/DOC/all_instruments/',
    };
    
    s,(?:/|\A|\b\z)/*,/,g for @{$config}{grep { /_path$/ } keys %$config };
    
    print Dumper $config;
    

    Results:

    $VAR1 = {
              'output_path' => '/wibble/',
              'this_path' => '/var/backup/DOC/all_instruments/',
              'backup_path' => '/fish/',
              'work_path' => '/narf/zoit/',
              'test_path' => '/home/datamonster/c2counts/',
              'another_path' => '/home/teledyne/tmp/',
              'again_path' => '/home/teledyne/tmp/'
            };