To try MappedByteBuffer
(memory mapped file in Java), I wrote a simple wc -l
(text file line count) demo:
int wordCount(String fileName) throws IOException {
FileChannel fc = new RandomAccessFile(new File(fileName), "r").getChannel();
MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
int nlines = 0;
byte newline = '\n';
for(long i = 0; i < fc.size(); i++) {
if(mem.get() == newline)
nlines += 1;
}
return nlines;
}
I tried this on a file of about 15 MB (15008641 bytes), and 100k lines. On my laptop, it takes about 13.8 sec
. Why is it so slow?
Complete class code is here: http://pastebin.com/t8PLRGMa
For the reference, I wrote the same idea in C: http://pastebin.com/hXnDvZm6
It runs in about 28 ms, or 490 times faster
.
Out of curiosity, I also wrote a Scala version using essentially the same algorithm and APIs as in Java. It runs 10 times faster
, which suggests there is definitely something odd going on.
Update: The file is cached by the OS, so there is no disk loading time involved.
I wanted to use memory mapping for random access to bigger files which may not fit into RAM. That is why I am not just using a BufferedReader.
The code is very slow, because fc.size()
is called in the loop.
JVM obviously cannot eliminate fc.size()
, since file size can be changed in run-time. Querying file size is relatively slow, because it requires a system call to the underlying file system.
Change this to
long size = fc.size();
for (long i = 0; i < size; i++) {
...
}