javabinarybytebufferfilechannelmappedbytebuffer

MappedByteBuffer slow on initial run


long time reader, first time poster.

I'm having a bit of trouble reading data quickly from a set of binary files. ByteBuffers and MappedBytBuffers offer the performance I require but they seem to require an initial run to warm up. I'm not sure if that makes sense so here's some code:

int BUFFERSIZE = 864;
int DATASIZE = 33663168;

int pos = 0;
// Open File channel to get data
FileChannel channel = new RandomAccessFile(new File(myFile), "r").getChannel();

// Set MappedByteBuffer to read DATASIZE bytes from channel
MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_ONLY, pos, DATASIZE);

// Set Endianness
mbb.order(ByteOrder.nativeOrder());

ArrayList<Double> ndt = new ArrayList<Double>();

// Read doubles from MappedByteBuffer, perform conversion and add to arraylist
while (pos < DATASIZE) {
    xf = mbb.getDouble(pos);
    ndt.add(xf * cnst * 1000d + stt);
    pos += BUFFERSIZE;
}

// Return arraylist
return ndt;

So this takes about 7 seconds to run but if I then run it again it does it in 10ms. It seems that it needs to do some sort of initial run to set up the correct behaviour. I've found that by doing something simple like this works:

channel = new RandomAccessFile(new File(mdfFile), "r").getChannel();
ByteBuffer buf = ByteBuffer.allocateDirect(DATASIZE);
channel.read(buf);
channel.close();

This takes around 2 seconds and if I then run through the MappedByteBuffer procedure it returns the data in 10ms. I just cannot figure out how to get rid of that initialisation step and read the data in 10ms first time. I've read all sorts of things about 'warming up', JIT and the JVM but all to no avail.

So, my question is, is it possible to get the 10 ms performance straight away or do I need to do some sort of initialisation? If so, what is the fastest way to do this please?

The code is intended to run through around a thousand quite large files so speed is quite important.

Many thanks.


Solution

  • I just cannot figure out how to get rid of that initialisation step and read the data in 10ms first time

    You can't. The data does have to be read from the disk. That takes longer than 10ms. The 10ms is for all the other times when it's already in memory.