javazipbytearrayoutputstreamzipoutputstreambytearrayinputstream

How to create a multipart zip file and read it back?


How would I properly zip bytes to a ByteArrayOutputStream and then read that using a ByteArrayInputStream? I have the following method:

private byte[] getZippedBytes(final String fileName, final byte[] input) throws Exception {
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    ZipOutputStream zipOut = new ZipOutputStream(bos);
    ZipEntry entry = new ZipEntry(fileName);
    entry.setSize(input.length);
    zipOut.putNextEntry(entry);
    zipOut.write(input, 0, input.length);
    zipOut.closeEntry();
    zipOut.close();

    //Turn right around and unzip what we just zipped
    ZipInputStream zipIn = new ZipInputStream(new ByteArrayInputStream(bos.toByteArray()));

    while((entry = zipIn.getNextEntry()) != null) {
        assert entry.getSize() >= 0;
    }

    return bos.toByteArray();
}

When I execute this code, the assertion at the bottom fails because entry.size is -1. I don't understand why the extracted entity doesn't match the entity that was zipped.


Solution

  • Why is the size -1?

    Calling getNextEntry in a ZipInputStream just position the read cursor at start of the entry to read.

    The size (along with other metadata) is stored at the end of the actual data, therefore is not readily available when the cursor is positioned at the start.

    These information becomes available only after you read the whole entry data or just go to the next entry.

    For example, going to the next entry:

    // position at the start of the first entry
    entry = zipIn.getNextEntry();
    ZipEntry firstEntry = entry;    
    // size is not yet available
    System.out.println("before " + firstEntry.getSize()); // prints -1
    
    // position at the start of the second entry
    entry = zipIn.getNextEntry();
    // size is now available
    System.out.println("after " + firstEntry.getSize()); // prints the size
    

    or reading the whole entry data:

    // position at the start of the first entry
    entry = zipIn.getNextEntry();
    // size is not yet available
    System.out.println("before " + entry.getSize()); // prints -1
    
    // read the whole entry data
    while(zipIn.read() != -1);
    
    // size is now available
    System.out.println("after " + entry.getSize()); // prints the size
    

    Your misunderstanding is quite common and there are a number of bug reports regarding this problem (which are closed as "Not an Issue"), like JDK-4079029, JDK-4113731, JDK-6491622.

    As also mentioned in the bug reports, you could use ZipFile instead of ZipInputStream which would allow to reach the size information prior to access the entry data; but to create a ZipFile you need a File (see the constructors) instead of a byte array.

    For example:

    File file = new File( "test.zip" );
    ZipFile zipFile = new ZipFile(file);
    
    Enumeration enumeration = zipFile.entries();
    while (enumeration.hasMoreElements()) {
        ZipEntry zipEntry = (ZipEntry) enumeration.nextElement();
        System.out.println(zipEntry.getSize()); // prints the size
    }
    

    How to get the data from the input stream?

    If you want to check if the unzipped data is equal to the original input data, you could read from the input stream like so:

    byte[] output = new byte[input.length];
    entry = zipIn.getNextEntry();
    zipIn.read(output);
    
    System.out.println("Are they equal? " + Arrays.equals(input, output));
    
    // and if we want the size
    zipIn.getNextEntry(); // or zipIn.read();
    System.out.println("and the size is " + entry.getSize());
    

    Now output should have the same content as input.