perlmmapslurp

why is Perl File::Map so slow compared to File::Slurp?


I thought I'd try using mmap to search a multi-gigabyte file without running out of memory. I tested on a file that did actually fit and the File::Slurp version took less than a minute but the File::Map version was still running after many minutes so I killed it.

I tested smaller files and found that the File::Map version got progressively slower as file size increased (2x size => 4x time) while the File::Slurp performance remained fairly constant (2x size => 2x time).

Am I not using the module correctly, or does File::Map always get slow on large files?


for n in 1 4 16 32 64 128 256 512 4096; do
    seq $n | xargs -I@ seq 100000  > data
    ls -l data
    time perl -MFile::Slurp -e '
          $s = read_file("data");
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($s =~ m/$re/g){ ++$matches }
          print $matches;
    '
    time perl -MFile::Map=:all -e '
          map_file $s, "data";
          advise $s, "sequential";
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($s =~ m/$re/g){ ++$matches }
          print $matches;
    '
done
n size matches usr(slurp) usr(slurp)/n sys(slurp) sys(slurp)/n usr(map) usr(map)/n sys(map) sys(map)/n
1 588895 5 0.033 0.033 0.007 0.007 0.014 0.014 0.001 0.001
4 2355580 20 0.051 0.013 0.007 0.002 0.032 0.008 0.005 0.001
16 9422320 80 0.109 0.007 0.015 0.001 0.138 0.009 0.012 0.001
32 18844640 160 0.184 0.005 0.024 0.001 0.400 0.013 0.021 0.001
64 37689280 320 0.328 0.005 0.049 0.001 2.666 0.042 4.305 0.067
128 75378560 640 0.629 0.005 0.079 0.001 10.014 0.078 17.638 0.138
256 150757120 1280 1.220 0.005 0.162 0.001 40.237 0.157 73.829 0.288
512 301514240 2560 2.423 0.005 0.323 0.001 158.729 0.310 302.041 0.590
4096 2412113920 20480 19.468 0.005 2.424 0.001 ? ? ? ?

Instead of manually calculating the table from ls and time output, following @TLP's suggestion, here's a Perl Benchmark version (warning output elided) that also indicates that File::Slurp's performance is independent of file size but File::Map gets slower:

#!/bin/bash

for n in 1 4 16 32 64 128 256 512; do
    seq $n | xargs -I@ seq 100000 > data$n
done

perl -MBenchmark=cmpthese -MFile::Slurp -MFile::Map=:all -e '
    @n = (1,4,16,32,64,128,256,512);

    sub test_slurp {
          my ($s,$re,$matches);
          $s = read_file($f);
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($s =~ m/$re/g){ ++$matches }
    }
    sub test_map {
          my ($mm,$re,$matches);
          map_file $mm, $f;
          advise $mm, "sequential";
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($mm =~ m/$re/g){ ++$matches }
    }

    for $n (@n) {
        $f = "data$n";
        cmpthese(-1, { "map($n)" => \&test_map, "slurp($n)" => \&test_slurp });
    }
'
          Rate   map(1) slurp(1)
map(1)   198/s       --      -1%
slurp(1) 200/s       1%       --
           Rate   map(4) slurp(4)
map(4)   38.3/s       --     -20%
slurp(4) 48.1/s      26%       --
            Rate   map(16) slurp(16)
map(16)   6.60/s        --      -48%
slurp(16) 12.6/s       91%        --
            Rate   map(32) slurp(32)
map(32)   1.98/s        --      -62%
slurp(32) 5.17/s      161%        --
          s/iter   map(64) slurp(64)
map(64)     7.93        --      -96%
slurp(64)  0.350     2166%        --
           s/iter   map(128) slurp(128)
map(128)     31.6         --       -98%
slurp(128)  0.730      4233%         --
           s/iter   map(256) slurp(256)
map(256)      129         --       -99%
slurp(256)   1.55      8244%         --
           s/iter   map(512) slurp(512)
map(512)      521         --       -99%
slurp(512)   2.82     18372%         --

Solution

  • Perl's copy-on-write ("COW") can't be used for the memory-mapped file string since there's no unused space at the end of the string buffer for the COW data. So when it comes time to make a copy of the scalar for $&, the whole string buffer (the actual data) is copied. And this happens for every successful match (and maybe for unsuccessful matches too).


    Take this program:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Devel::Peek qw( Dump );
    use File::Map   qw( map_file advise );
    my $re = qr/a/;
    my $s;
    if ( $ARGV[0] == 0 ) {
        $s = "aaaaa";
    }
    elsif ( $ARGV[0] == 1 ) {
        map_file $s, "a";
        advise $s, "sequential";
    }
    else {
        map_file my $t, "a";
        advise $t, "sequential";
        $s = $t;
    }
    Dump( $s );
    while ( $s =~ /$re/g ) {
       Dump( $s );
       last;
    }
    

    We first run without File::Map.

    $ ./a.pl 0
    SV = PV(0x58d5924eaee0) at 0x58d5925260e8
      REFCNT = 1
      FLAGS = (POK,IsCOW,pPOK)             <--- COW flag is on.
      PV = 0x58d592581d50 "aaaaa"\0
      CUR = 5
      LEN = 16
      COW_REFCNT = 1                       <--- We're sharing with the constant.
    SV = PVMG(0x58d5925625c0) at 0x58d5925260e8
      REFCNT = 1
      FLAGS = (SMG,POK,IsCOW,pPOK)         <--- COW flag still on.
      IV = 0
      NV = 0
      PV = 0x58d592581d50 "aaaaa"\0
      CUR = 5
      LEN = 16
      COW_REFCNT = 2                       <--- An third scalar now sharing too.
      MAGIC = 0x58d592581d70
        MG_VIRTUAL = &PL_vtbl_mglob
        MG_TYPE = PERL_MAGIC_regex_global(g)
        MG_FLAGS = 0x40
          BYTES
        MG_LEN = 1
    

    Since 5.20, Perl uses a copy-on-write ("COW") mechanism for strings, allowing scalars to share a string buffer. The string buffer is only copied when an attempt to change it occurs.

    In the above output, we see that the string buffer becomes shared on a successful match. This is because a copy of the scalar being matched against was made for $&, $1, etc. to access. (A single copy of the scalar is made. $&, $1, etc are all magic variables that access this one copy.)

    Again, because of the COW mechanism, the scalar is copied but not the string buffer. So this is very efficient.


    Now, let's run using File::Map.

    $ ./a.pl 1
    SV = PVMG(0x6006cd285470) at 0x6006cd249178
      REFCNT = 1
      FLAGS = (SMG,RMG,POK,READONLY,pPOK)
      IV = 0
      NV = 0
      PV = 0x7eb64e26e000 "aaaaa\n"
      CUR = 6
      LEN = 0
     MAGIC = 0x6006cd2aa4b0
        MG_VIRTUAL = 0x7eb64e278da0
        MG_TYPE = PERL_MAGIC_ext(~)
        MG_FLAGS = 0x30
          DUP
          LOCAL
        MG_PTR = 0x6006cd2544f0 ""
    SV = PVMG(0x6006cd285470) at 0x6006cd249178
      REFCNT = 1
      FLAGS = (SMG,RMG,POK,READONLY,pPOK)  <--- COW flag isn't set.
      IV = 0
      NV = 0
      PV = 0x7eb64e26e000 "aaaaa\n"
      CUR = 6
      LEN = 0
      MAGIC = 0x6006cd2473b0
        MG_VIRTUAL = &PL_vtbl_mglob
        MG_TYPE = PERL_MAGIC_regex_global(g)
        MG_FLAGS = 0x40
          BYTES
        MG_LEN = 1
      MAGIC = 0x6006cd2aa4b0
        MG_VIRTUAL = 0x7eb64e278da0
        MG_TYPE = PERL_MAGIC_ext(~)
        MG_FLAGS = 0x30
          DUP
          LOCAL
        MG_PTR = 0x6006cd2544f0 ""
    

    The COW mechanism stores information at the end of the string buffer, in the allocated space that's not currently being used by the string. But there's no such space in the memory-mapped file. This prevents the COW mechanism from being used.

    The copy of the scalar that's made for $&, $1, etc. to access must therefore copy the string buffer as well.


    Finally, let use File::Map, but pass a copy of the memory-mapped file to the regex engine.

    $ ./a.pl 2
    SV = PVMG(0x6475db959130) at 0x6475db8f0660
      REFCNT = 1
      FLAGS = (POK,pPOK)
      IV = 0
      NV = 0
      PV = 0x6475db8bd770 "aaaaa\n"\0
      CUR = 6
      LEN = 16
    SV = PVMG(0x6475db959130) at 0x6475db8f0660
      REFCNT = 1
      FLAGS = (SMG,POK,IsCOW,pPOK)         <--- COW flag now on.
      IV = 0
      NV = 0
      PV = 0x6475db8bd770 "aaaaa\n"\0
      CUR = 6
      LEN = 16
      COW_REFCNT = 1                       <--- An additional scalar now shares the string.
      MAGIC = 0x6475db9757d0
        MG_VIRTUAL = &PL_vtbl_mglob
        MG_TYPE = PERL_MAGIC_regex_global(g)
        MG_FLAGS = 0x40
          BYTES
        MG_LEN = 1
    

    Some extra space was allocated when the copy was made, allowing the COW mechanism to work as intended.


    This suggests that making a pre-emptive copy of the memory-mapped file would address the problem. Let's adjust your benchmark code to do that.

    #!/bin/bash
    
    for n in 1 4 16 32 64 128 256 512; do
        seq $n | xargs -I@ seq 100000 > data$n
    done
    
    perl -MBenchmark=cmpthese -MFile::Slurp -MFile::Map=:all -e '
        @n = (1,4,16,32,64,128,256,512);
    
        sub test_slurp {
              my ($s,$re,$matches);
              $s = read_file($f);
              $re = qr/^(99999|12345|4325|11111|50000)$/m;
              while ($s =~ m/$re/g){ ++$matches }
        }
    
        sub test_map {
              my ($mm,$re,$matches);
              map_file $mm, $f;
              advise $mm, "sequential";
              $re = qr/^(99999|12345|4325|11111|50000)$/m;
              while ($mm =~ m/$re/g){ ++$matches }
        }
    
        sub test_modified_map {
              my ($mm,$re,$matches);
              map_file $mm, $f;
              advise $mm, "sequential";
              my $s = $mm;
              $re = qr/^(99999|12345|4325|11111|50000)$/m;
              while ($s =~ m/$re/g){ ++$matches }
        }
    
        for $n (@n) {
            $f = "data$n";
            cmpthese(-1, {
               "slurp($n)" => \&test_slurp,
               "map($n)"   => \&test_map,
               "map($n)*"  => \&test_modified_map,
            });
            print "\n";
        }
    '
    
              Rate   map(1)  map(1)* slurp(1)
    map(1)   146/s       --      -3%      -4%
    map(1)*  150/s       3%       --      -1%
    slurp(1) 153/s       4%       1%       --
    
               Rate   map(4)  map(4)* slurp(4)
    map(4)   29.0/s       --     -10%     -33%
    map(4)*  32.1/s      11%       --     -26%
    slurp(4) 43.6/s      50%      36%       --
    
                Rate   map(16) slurp(16)  map(16)*
    map(16)   8.57/s        --      -22%      -22%
    slurp(16) 11.0/s       28%        --       -0%
    map(16)*  11.0/s       28%        0%        --
    
                Rate   map(32)  map(32)* slurp(32)
    map(32)   3.60/s        --      -35%      -36%
    map(32)*  5.50/s       53%        --       -2%
    slurp(32) 5.61/s       56%        2%        --
    
                (warning: too few iterations for a reliable count)
                    Rate   map(64)  map(64)* slurp(64)
    map(64)   6.71e-02/s        --      -98%      -98%
    map(64)*      3.25/s     4746%        --       -1%
    slurp(64)     3.28/s     4785%        1%        --
    
                (warning: too few iterations for a reliable count)
                (warning: too few iterations for a reliable count)
                (warning: too few iterations for a reliable count)
                     Rate   map(128)  map(128)* slurp(128)
    map(128)   1.56e-02/s         --       -99%       -99%
    map(128)*      1.42/s      8984%         --        -9%
    slurp(128)     1.56/s      9906%        10%         --
    ^C
    

    Indeed, the problem is gone.