why is Perl File::Map so slow compared to File::Slurp?

I thought I'd try using mmap to search a multi-gigabyte file without running out of memory. I tested on a file that did actually fit and the File::Slurp version took less than a minute but the File::Map version was still running after many minutes so I killed it.

I tested smaller files and found that the File::Map version got progressively slower as file size increased (2x size => 4x time) while the File::Slurp performance remained fairly constant (2x size => 2x time).

Am I not using the module correctly, or does File::Map always get slow on large files?


for n in 1 4 16 32 64 128 256 512 4096; do
    seq $n | xargs -I@ seq 100000  > data
    ls -l data
    time perl -MFile::Slurp -e '
          $s = read_file("data");
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($s =~ m/$re/g){ ++$matches }
          print $matches;
    '
    time perl -MFile::Map=:all -e '
          map_file $s, "data";
          advise $s, "sequential";
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($s =~ m/$re/g){ ++$matches }
          print $matches;
    '
done

n	size	matches	usr(slurp)	usr(slurp)/n	sys(slurp)	sys(slurp)/n	usr(map)	usr(map)/n	sys(map)	sys(map)/n
1	588895	5	0.033	0.033	0.007	0.007	0.014	0.014	0.001	0.001
4	2355580	20	0.051	0.013	0.007	0.002	0.032	0.008	0.005	0.001
16	9422320	80	0.109	0.007	0.015	0.001	0.138	0.009	0.012	0.001
32	18844640	160	0.184	0.005	0.024	0.001	0.400	0.013	0.021	0.001
64	37689280	320	0.328	0.005	0.049	0.001	2.666	0.042	4.305	0.067
128	75378560	640	0.629	0.005	0.079	0.001	10.014	0.078	17.638	0.138
256	150757120	1280	1.220	0.005	0.162	0.001	40.237	0.157	73.829	0.288
512	301514240	2560	2.423	0.005	0.323	0.001	158.729	0.310	302.041	0.590
4096	2412113920	20480	19.468	0.005	2.424	0.001	?	?	?	?

Instead of manually calculating the table from ls and time output, following @TLP's suggestion, here's a Perl Benchmark version (warning output elided) that also indicates that File::Slurp's performance is independent of file size but File::Map gets slower:

#!/bin/bash

for n in 1 4 16 32 64 128 256 512; do
    seq $n | xargs -I@ seq 100000 > data$n
done

perl -MBenchmark=cmpthese -MFile::Slurp -MFile::Map=:all -e '
    @n = (1,4,16,32,64,128,256,512);

    sub test_slurp {
          my ($s,$re,$matches);
          $s = read_file($f);
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($s =~ m/$re/g){ ++$matches }
    }
    sub test_map {
          my ($mm,$re,$matches);
          map_file $mm, $f;
          advise $mm, "sequential";
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($mm =~ m/$re/g){ ++$matches }
    }

    for $n (@n) {
        $f = "data$n";
        cmpthese(-1, { "map($n)" => \&test_map, "slurp($n)" => \&test_slurp });
    }
'

          Rate   map(1) slurp(1)
map(1)   198/s       --      -1%
slurp(1) 200/s       1%       --
           Rate   map(4) slurp(4)
map(4)   38.3/s       --     -20%
slurp(4) 48.1/s      26%       --
            Rate   map(16) slurp(16)
map(16)   6.60/s        --      -48%
slurp(16) 12.6/s       91%        --
            Rate   map(32) slurp(32)
map(32)   1.98/s        --      -62%
slurp(32) 5.17/s      161%        --
          s/iter   map(64) slurp(64)
map(64)     7.93        --      -96%
slurp(64)  0.350     2166%        --
           s/iter   map(128) slurp(128)
map(128)     31.6         --       -98%
slurp(128)  0.730      4233%         --
           s/iter   map(256) slurp(256)
map(256)      129         --       -99%
slurp(256)   1.55      8244%         --
           s/iter   map(512) slurp(512)
map(512)      521         --       -99%
slurp(512)   2.82     18372%         --

Solution

Perl's copy-on-write ("COW") can't be used for the memory-mapped file string since there's no unused space at the end of the string buffer for the COW data. So when it comes time to make a copy of the scalar for $&, the whole string buffer (the actual data) is copied. And this happens for every successful match (and maybe for unsuccessful matches too).

Take this program:

#!/usr/bin/perl
use strict;
use warnings;
use Devel::Peek qw( Dump );
use File::Map   qw( map_file advise );
my $re = qr/a/;
my $s;
if ( $ARGV[0] == 0 ) {
    $s = "aaaaa";
}
elsif ( $ARGV[0] == 1 ) {
    map_file $s, "a";
    advise $s, "sequential";
}
else {
    map_file my $t, "a";
    advise $t, "sequential";
    $s = $t;
}
Dump( $s );
while ( $s =~ /$re/g ) {
   Dump( $s );
   last;
}

We first run without File::Map.

$ ./a.pl 0
SV = PV(0x58d5924eaee0) at 0x58d5925260e8
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)             <--- COW flag is on.
  PV = 0x58d592581d50 "aaaaa"\0
  CUR = 5
  LEN = 16
  COW_REFCNT = 1                       <--- We're sharing with the constant.
SV = PVMG(0x58d5925625c0) at 0x58d5925260e8
  REFCNT = 1
  FLAGS = (SMG,POK,IsCOW,pPOK)         <--- COW flag still on.
  IV = 0
  NV = 0
  PV = 0x58d592581d50 "aaaaa"\0
  CUR = 5
  LEN = 16
  COW_REFCNT = 2                       <--- An third scalar now sharing too.
  MAGIC = 0x58d592581d70
    MG_VIRTUAL = &PL_vtbl_mglob
    MG_TYPE = PERL_MAGIC_regex_global(g)
    MG_FLAGS = 0x40
      BYTES
    MG_LEN = 1

Since 5.20, Perl uses a copy-on-write ("COW") mechanism for strings, allowing scalars to share a string buffer. The string buffer is only copied when an attempt to change it occurs.

In the above output, we see that the string buffer becomes shared on a successful match. This is because a copy of the scalar being matched against was made for $&, $1, etc. to access. (A single copy of the scalar is made. $&, $1, etc are all magic variables that access this one copy.)

Again, because of the COW mechanism, the scalar is copied but not the string buffer. So this is very efficient.

Now, let's run using File::Map.

$ ./a.pl 1
SV = PVMG(0x6006cd285470) at 0x6006cd249178
  REFCNT = 1
  FLAGS = (SMG,RMG,POK,READONLY,pPOK)
  IV = 0
  NV = 0
  PV = 0x7eb64e26e000 "aaaaa\n"
  CUR = 6
  LEN = 0
 MAGIC = 0x6006cd2aa4b0
    MG_VIRTUAL = 0x7eb64e278da0
    MG_TYPE = PERL_MAGIC_ext(~)
    MG_FLAGS = 0x30
      DUP
      LOCAL
    MG_PTR = 0x6006cd2544f0 ""
SV = PVMG(0x6006cd285470) at 0x6006cd249178
  REFCNT = 1
  FLAGS = (SMG,RMG,POK,READONLY,pPOK)  <--- COW flag isn't set.
  IV = 0
  NV = 0
  PV = 0x7eb64e26e000 "aaaaa\n"
  CUR = 6
  LEN = 0
  MAGIC = 0x6006cd2473b0
    MG_VIRTUAL = &PL_vtbl_mglob
    MG_TYPE = PERL_MAGIC_regex_global(g)
    MG_FLAGS = 0x40
      BYTES
    MG_LEN = 1
  MAGIC = 0x6006cd2aa4b0
    MG_VIRTUAL = 0x7eb64e278da0
    MG_TYPE = PERL_MAGIC_ext(~)
    MG_FLAGS = 0x30
      DUP
      LOCAL
    MG_PTR = 0x6006cd2544f0 ""

The COW mechanism stores information at the end of the string buffer, in the allocated space that's not currently being used by the string. But there's no such space in the memory-mapped file. This prevents the COW mechanism from being used.

The copy of the scalar that's made for $&, $1, etc. to access must therefore copy the string buffer as well.

Finally, let use File::Map, but pass a copy of the memory-mapped file to the regex engine.

$ ./a.pl 2
SV = PVMG(0x6475db959130) at 0x6475db8f0660
  REFCNT = 1
  FLAGS = (POK,pPOK)
  IV = 0
  NV = 0
  PV = 0x6475db8bd770 "aaaaa\n"\0
  CUR = 6
  LEN = 16
SV = PVMG(0x6475db959130) at 0x6475db8f0660
  REFCNT = 1
  FLAGS = (SMG,POK,IsCOW,pPOK)         <--- COW flag now on.
  IV = 0
  NV = 0
  PV = 0x6475db8bd770 "aaaaa\n"\0
  CUR = 6
  LEN = 16
  COW_REFCNT = 1                       <--- An additional scalar now shares the string.
  MAGIC = 0x6475db9757d0
    MG_VIRTUAL = &PL_vtbl_mglob
    MG_TYPE = PERL_MAGIC_regex_global(g)
    MG_FLAGS = 0x40
      BYTES
    MG_LEN = 1

Some extra space was allocated when the copy was made, allowing the COW mechanism to work as intended.

This suggests that making a pre-emptive copy of the memory-mapped file would address the problem. Let's adjust your benchmark code to do that.

#!/bin/bash

for n in 1 4 16 32 64 128 256 512; do
    seq $n | xargs -I@ seq 100000 > data$n
done

perl -MBenchmark=cmpthese -MFile::Slurp -MFile::Map=:all -e '
    @n = (1,4,16,32,64,128,256,512);

    sub test_slurp {
          my ($s,$re,$matches);
          $s = read_file($f);
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($s =~ m/$re/g){ ++$matches }
    }

    sub test_map {
          my ($mm,$re,$matches);
          map_file $mm, $f;
          advise $mm, "sequential";
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($mm =~ m/$re/g){ ++$matches }
    }

    sub test_modified_map {
          my ($mm,$re,$matches);
          map_file $mm, $f;
          advise $mm, "sequential";
          my $s = $mm;
          $re = qr/^(99999|12345|4325|11111|50000)$/m;
          while ($s =~ m/$re/g){ ++$matches }
    }

    for $n (@n) {
        $f = "data$n";
        cmpthese(-1, {
           "slurp($n)" => \&test_slurp,
           "map($n)"   => \&test_map,
           "map($n)*"  => \&test_modified_map,
        });
        print "\n";
    }
'

          Rate   map(1)  map(1)* slurp(1)
map(1)   146/s       --      -3%      -4%
map(1)*  150/s       3%       --      -1%
slurp(1) 153/s       4%       1%       --

           Rate   map(4)  map(4)* slurp(4)
map(4)   29.0/s       --     -10%     -33%
map(4)*  32.1/s      11%       --     -26%
slurp(4) 43.6/s      50%      36%       --

            Rate   map(16) slurp(16)  map(16)*
map(16)   8.57/s        --      -22%      -22%
slurp(16) 11.0/s       28%        --       -0%
map(16)*  11.0/s       28%        0%        --

            Rate   map(32)  map(32)* slurp(32)
map(32)   3.60/s        --      -35%      -36%
map(32)*  5.50/s       53%        --       -2%
slurp(32) 5.61/s       56%        2%        --

            (warning: too few iterations for a reliable count)
                Rate   map(64)  map(64)* slurp(64)
map(64)   6.71e-02/s        --      -98%      -98%
map(64)*      3.25/s     4746%        --       -1%
slurp(64)     3.28/s     4785%        1%        --

            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
                 Rate   map(128)  map(128)* slurp(128)
map(128)   1.56e-02/s         --       -99%       -99%
map(128)*      1.42/s      8984%         --        -9%
slurp(128)     1.56/s      9906%        10%         --
^C

Indeed, the problem is gone.