I try to write a subroutine under Perl 5 version 5.20, that creates a large directory list stored in an array. The subroutine returns the result as an arrayref. For convenience reasons I want the have the option to sort the result.
#!/usr/bin/env perl
use v5.20;
use warnings;
use strict;
use File::Slurp qw(read_dir);
use Time::HiRes;
use feature qw(signatures);
no warnings 'once';
no warnings 'experimental';
no warnings 'experimental::signatures';
my $PATH='/net/dbfs/GRM-RS/Flight-Campaigns/2021-08-23.Ram-Head-i-22.SE-01/cam/MM010259/iiq/';
sub fsReadDir($base, $sort, $mode = 1) {
$base //= '.'; # Base path default is the current path
$sort //= 0; # Flag for array sorting of the result
my @res=read_dir($base);
if ($sort) {
return [sort(@res)] if $mode == 1;
if ($mode == 2) {
@res = sort(@res);
return \@res;
}
} else {
return \@res;
}
}
sub testSorting($sort, $mode, $max = 1000) {
my $start = [Time::HiRes::gettimeofday()];
my $count = 0;
for my $ix (0..$max) {
my $array = fsReadDir($PATH, $sort, $mode );
$count = @$array;
}
my $end = time();
my $dif = Time::HiRes::tv_interval($start);
print "SORT: $sort MODE: $mode COUNT: $count TIME: $dif s\n"
}
testSorting(0, 1);
testSorting(1, 1);
testSorting(1, 2);
/usr/bin/env perl "test-array.pl"
SORT: 0 MODE: 1 COUNT: 14861 TIME: 6.882694 s
SORT: 1 MODE: 1 COUNT: 14861 TIME: 9.131504 s
SORT: 1 MODE: 2 COUNT: 14861 TIME: 8.622628 s
What is the effective way to sort the array directly at the return
level?
If you insist on sorting out the sorting business in the return
statement itself can use a ternary
return $sort ? [ sort @res ] : \@res;
This may be all well and clear enough in simple cases.
However, I find it clearer to first deal with cases and options and then return the result
@res = sort @res if $sort;
if ($mode == 1) { ... } # modes given in the question do nearly the same,
elsif ($mode == 2) { ... } # but imagine different processing based on value
...
return \@res;
Also, sorting in place should be a little more efficient.
If this were about efficiency then you'd want to benchmark different approaches, and under realistic circumstances. For one, it may all get blown out of the water by reading a large directory, when one may not be able to tell any performance difference in how exactly the return is constructed.
So I'd go for clarity, until it is clearly seen that the choice does affect performance.