perlsortingcpan

Sort strings with a specific letter order with Perl


I'm trying to sort lists of names with Perl with a specific letter order to perform some special features.
The sorting would be working the same way as sort { $a cmp $b } but with a different succession of letters.
For example, ordering with the arbitrary character order "abdrtwsuiopqe987654" ...

I tried to deal with sort { $a myFunction $b } but I'm newbie with Perl and I don't see how to organize correctly myFunction to get what I want.


Solution

  • The following is probably the fastest[1]:

    sub my_compare($$) {
        $_[0] =~ tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r
           cmp
        $_[1] =~ tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r
    }
    
    my @sorted = sort my_compare @unsorted;
    

    Or if you want something more dynamic, the following might be the fastest[2]:

    my @syms = split //, 'abdrtwsuiopqe987654';
    my @map; $map[ord($syms[$_])] = $_ for 0..$#syms;
    
    sub my_compare($$) {
        (pack 'C*', map $map[ord($_)], unpack 'C*', $_[0])
           cmp
        (pack 'C*', map $map[ord($_)], unpack 'C*', $_[1])
    }
    
    my @sorted = sort my_compare @unsorted;
    

    We could compare character by character, but that will be far slower.

    use List::Util qw( min );
    
    my @syms = split //, 'abdrtwsuiopqe987654';
    my @map; $map[ord($syms[$_])] = $_ for 0..$#syms;
    
    sub my_compare($$) {
        my $l0 = length($_[0]);
        my $l1 = length($_[1]);
        for (0..min($l0, $l1)) {
           my $ch0 = $map[ord(substr($_[0], $_, 1))];
           my $ch1 = $map[ord(substr($_[1], $_, 1))];
           return -1 if $ch0 < $ch1;
           return +1 if $ch0 > $ch1;
        }
    
        return -1 if $l0 < $l1;
        return +1 if $l0 > $l1;
        return 0;
    }
    
    my @sorted = sort my_compare @unsorted;
    

    1. Technically, it can be made faster using GRT.

       my @sorted =
          map /\0(.*)/s,
          sort
          map { tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r . "\0" . $_ }
          @unsorted;
      
    2. Technically, it can be made faster using GRT.

       my @sorted =
          map /\0(.*)/s,
          sort
          map { ( pack 'C*', map $map[ord($_)], unpack 'C*', $_ ) . "\0" . $_ }
          @unsorted;
      

    cmp is implemented by the scmp operator.

    $ perl -MO=Concise,-exec -e'$x cmp $y'
    1  <0> enter
    2  <;> nextstate(main 1 -e:1) v:{
    3  <#> gvsv[*x] s
    4  <#> gvsv[*y] s
    5  <2> scmp[t3] vK/2
    6  <@> leave[1 ref] vKP/REFC
    

    The scmp operator is implemented by the pp_scmp function in pp.c, which is really just a wrapper for sv_cmp_flags in sv.c when use locale; isn't in effect. sv_cmp_flags either uses C library function memcmp or a UTF-8 aware version (depending on the type of scalar).