javainputstream

Construct ZipInputStream from BufferedInputStream or not


I see that ZipInputStream is inherited from InflaterInputStream. The latter has an internal buffer.

Most code examples I see construct a new ZipInputStream the following way:

File f = new File("myfile");
FileInputStream fs = new FileInputStream(f);
BufferedInputStream bs = new BufferedInputStream(fs);
ZipInputStream zs = new ZipInputStream(bs);

I read that subclasses from InflaterInputStream needn't be constructed from a BufferedInputStream, because they are buffered themselves. Is this correct? And what would be the recommended way to construct a ZipInputStream?

Maybe a bit of context, the zip files I am reading range from a few KB to a few MB in size.


Solution

  • Both InflaterInputStream and BufferedInputStream will try to read a byte[] (or part of it) from the underlying stream, and will NOT try to fill fully such buffer if the underlying stream returns less bytes than requested.

    So, on that, it looks like extra buffering is a waste.

    Yet, buffering is not just a yes/no concept. The buffer size of ZipInputStream (see its call to its super constructor) is 512 bytes. Compared to the BufferedInputStream default of 8192 that means 16x more calls to refill its buffer.

    Not only that, but if the underlying stream might have delivered many kB if it had been asked for as much, but the ZipInputStream still tries to refill only 512.

    We're talking nano seconds here, if not microseconds, so its not exactly an optimization worth to hunt down, unless your underlying streams are terribly poor at filling a byte[], like if they made new network request everytime or something similarly ridiculous.

    Personally, I would always put a BufferedInpuStream, sometimes even with 16kB, just in case I'm getting data from a TLS connection (whose ssl records buffers are maximum 16kB).

    The backside of it all is the risk of loading more data than needed, particularly over network. If you are plowing through an entire zip file, that's not an issue, you'll not waste that download. But if you were about to read just a few zipentry and then close, then you might have downloaded a lot more than needed.

    As usual, it's a matter of knowing your data/usage pattern, and how much memory you can afford.