Can anyone recommend whether I should do something like:
os = new GzipOutputStream(new BufferedOutputStream(...));
or
os = new BufferedOutputStream(new GzipOutputStream(...));
Which is more efficient? Should I use BufferedOutputStream at all?
What order should I use
GzipOutputStream
andBufferedOutputStream
For object streams, I found that wrapping the buffered stream around the gzip stream for both input and output was almost always significantly faster. The smaller the objects, the better this did. Better or the same in all cases then no buffered stream.
ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(fis)));
oos = new ObjectOutputStream(new BufferedOutputStream(new GZIPOutputStream(fos)));
However, for text and straight byte streams, I found that it was a toss up -- with the gzip stream around the buffered stream being only slightly better. But better in all cases then no buffered stream.
reader = new InputStreamReader(new GZIPInputStream(new BufferedInputStream(fis)));
writer = new OutputStreamWriter(new GZIPOutputStream(new BufferedOutputStream(fos)));
I ran each version 20 times and cut off the first run and averaged the rest. I also tried buffered-gzip-buffered which was slightly better for objects and worse for text. I did not play with buffer sizes at all.
For the object streams, I tested 2 serialized object files in the 10s of megabytes. For the larger file (38mb), it was 85% faster on reading (0.7 versus 5.6 seconds) but actually slightly slower for writing (5.9 versus 5.7 seconds). These objects had some large arrays in them which may have meant larger writes.
method crc date time compressed uncompressed ratio
defla eb338650 May 19 16:59 14027543 38366001 63.4%
For the smaller file (18mb), it was 75% faster for reading (1.6 versus 6.1 seconds) and 40% faster for writing (2.8 versus 4.7 seconds). It contained a large number of small objects.
method crc date time compressed uncompressed ratio
defla 92c9d529 May 19 16:56 6676006 17890857 62.7%
For the text reader/writer I used a 64mb csv text file. The gzip stream around the buffered stream was 11% faster for reading (950 versus 1070 milliseconds) and slightly faster when writing (7.9 versus 8.1 seconds).
method crc date time compressed uncompressed ratio
defla c6b72e34 May 20 09:16 22560860 63465800 64.5%