javaunzipzipzipoutputstream

ZipFile : Wrong values when reading


I am creating a zip file with one directory and an single compressed text file inside of it.

Code to create the zip file

   try(ZipOutputStream zos=new ZipOutputStream(new FileOutputStream("E:/TestFile.zip")))
   {  
    //comment,level,method for all entries
    zos.setComment("Test Zip File");
    zos.setLevel(Deflater.BEST_COMPRESSION);
    zos.setMethod(Deflater.DEFLATED);
    
    //Creating Directories[ends with a forward slash]
    {
     ZipEntry dir1=new ZipEntry("Directory/");  
     
     //Give it a comment
     dir1.setComment("Directory");
     //Some extra data
     dir1.setExtra("Hello".getBytes());
     //Set Creation,Access,Modification Time
     FileTime time=FileTime.fromMillis(System.currentTimeMillis());
     dir1.setCreationTime(time);
     dir1.setLastAccessTime(time);
     dir1.setLastModifiedTime(time);
     
     //put the entry & close it
     zos.putNextEntry(dir1);
     zos.closeEntry();
    }
     
    //Creating an fully compressed file inside the directory with all informtion
    {
      ZipEntry file=new ZipEntry("Directory/Test.txt");
      
      //Meta Data
      {
       //Give it a comment
       file.setComment("A File");
       //Some extra data
       file.setExtra("World".getBytes());
       //Set Creation,Access,Modification Time
       FileTime time=FileTime.fromMillis(System.currentTimeMillis());
       file.setCreationTime(time);
       file.setLastAccessTime(time);
       file.setLastModifiedTime(time);
      }
    
     //Byte Data
     {
      //put entry for writing
      zos.putNextEntry(file);
      byte[] data="Hello World Hello World".getBytes();

      //Compress Data
      Deflater deflater=new Deflater(9);
      deflater.setDictionary("Hello World ".getBytes());
      deflater.setInput(data);
      deflater.finish();
      byte[] output=new byte[100];
      int compressed=deflater.deflate(output);
     
      //Write Data   
      CRC32 check=new CRC32();
      check.update(data);
      file.setSize(deflater.getBytesRead());
      file.setCrc(check.getValue());          
      file.setCompressedSize(compressed);     
      zos.write(output,0,compressed);
      
      //end data
      System.out.println(deflater.getBytesRead()+"/"+compressed);
      deflater.end();
     }
     
     //close the entry
     zos.closeEntry();
    }
   }
  }

Upon writing the file the size of the byte data uncompressed is 23 bytes and the size of the data compressed is 15. I am using every method inside ZipEntry just to test if i can retrive all the values correctly upon reading it.

Upon Reading it using ZipFile class & not ZipInputStream(bug getSize() always returns -1) using this code

 //reading zip file using ZipFile
  public static void main(String[] args)throws Exception
  {
   try(ZipFile zis=new ZipFile("E:/TestFile.zip"))
   {
    Enumeration<? extends ZipEntry> entries=zis.entries();
    while(entries.hasMoreElements())
    {
     ZipEntry entry=entries.nextElement();
     
     System.out.println("Name="+entry.getName());
     System.out.println("Is Directory="+entry.isDirectory());   
     System.out.println("Comment="+entry.getComment());
     System.out.println("Creation Time="+entry.getCreationTime());
     System.out.println("Access Time="+entry.getLastAccessTime());
     System.out.println("Modification Time="+entry.getLastModifiedTime());
     System.out.println("CRC="+entry.getCrc());
     System.out.println("Real Size="+entry.getSize());
     System.out.println("Compressed Size="+entry.getCompressedSize());
     System.out.println("Optional Data="+new String(entry.getExtra()));
     System.out.println("Method="+entry.getMethod());
     if(!entry.isDirectory())
     {
      Inflater inflater=new Inflater();
      try(InputStream is=zis.getInputStream(entry))
      {
       byte[] originalData=new byte[(int)entry.getSize()];
       inflater.setInput(is.readAllBytes());
       int realLength=inflater.inflate(originalData);
       if(inflater.needsDictionary())
       {
        inflater.setDictionary("Hello World ".getBytes());
        realLength=inflater.inflate(originalData);
       }
       inflater.end();

       System.out.println("Data="+new String(originalData,0,realLength));
      }  
     }
     System.out.println("=====================================================");
   }   
  }
 }  

I get this output

Name=Directory/
Is Directory=true
Comment=Directory
Creation Time=null
Access Time=null
Modification Time=2022-01-24T17:00:25Z
CRC=0
Real Size=0
Compressed Size=2
Optional Data=UTaHello
Method=8
=====================================================
Name=Directory/Test.txt
Is Directory=false
Comment=A File
Creation Time=null
Access Time=null
Modification Time=2022-01-24T17:00:25Z
CRC=2483042136
Real Size=15
Compressed Size=17
Optional Data=UT��aWorld
Method=8
Data=Hello World Hel
==================================================

There is a lot of wrong output in this code

For the directory

1)Creation Time & Access Time are null[even though i have specified it in the write method]

2)Extra Data[Optional Data] has wrong encoding

For the file

1)Creation Time & Access Time are null[even though i have specified it in the write method]

2)getSize() & getCompressedSize() methods return the wrong values. I have specified these values during writing manually with sizeSize() & setCompressedSize() when creating the file the values were 23 and 15 but it returns 15 and 17

3)Extra Data[Optional Data] has wrong encoding

4)Since getSize() returns incorrect size it dosen't display the whole data[Hello World Hel]

With so many things going wrong i thought to post this as one question rather than multiple small ones as they all seem related. I am a complete beginner in writing zip files so any direction on where do i go from here would be greatly appreciated.

I can read the data of an zip entry using an while loop into an buffer if the size is not known or incorrect which is not an problem but why would they even create an set or get size method if they knew we would be doing this most of the time anyway. Whats the point?


Solution

  • After much research i was able to solve 70% of the problems. Others can't be solved given the nature of how an ZipOutputStream & ZipFile reads the data

    Problem 1: Incorrect values returned by getSize() & getCompressedSize()

    1) During Writing

    I was blind to have not seen this earlier but ZipOutputStream already does compression for us and i was double compressing it by using my own inflater so i removed that code and i realized that you must specify these values only when you are using the method as STORED. else they are computed for you from the data. So refracting my zip writing code this is how it looks like

       try(ZipOutputStream zos=new ZipOutputStream(new FileOutputStream("E:/TestFile2.zip")))
       {  
        //comment,level,method for all entries
        zos.setComment("Test Zip File");
        //Auto Compression
        zos.setMethod(ZipOutputStream.DEFLATED);
        zos.setLevel(9);
        
        //Creating Directories[ends with a forward slash]
        {
         ZipEntry dir1=new ZipEntry("Directory/");  
         
         //Give it a comment
         dir1.setComment("Directory");
         //Some extra data
         dir1.setExtra("Hello".getBytes());
         //Set Creation,Access,Modification Time
         FileTime time=FileTime.fromMillis(System.currentTimeMillis());
         dir1.setCreationTime(time);
         dir1.setLastAccessTime(time);
         dir1.setLastModifiedTime(time);
         
         //put the entry & close it
         zos.putNextEntry(dir1);
         zos.closeEntry();
        }
         
        //Creating an fully compressed file inside the directory with all informtion
        {
          ZipEntry file=new ZipEntry("Directory/Test.txt");
          
          //Meta Data
          {
           //Give it a comment
           file.setComment("A File");
           //Some extra data
           file.setExtra("World".getBytes());
           //Set Creation,Access,Modification Time
           FileTime time=FileTime.fromMillis(System.currentTimeMillis());
           file.setCreationTime(time);
           file.setLastAccessTime(time);
           file.setLastModifiedTime(time);
          }
        
         //Byte Data
         {
          byte[] data="Hello World Hello World".getBytes();
         
          //Data
          zos.putNextEntry(file);
          zos.write(data);
          zos.flush();
         }
         
         //close the entry
         zos.closeEntry();
        }
        
        //finish writing the zip file without closing stream
        zos.finish();
       }
    

    2)During Reading

    To get the correct size & compressed size values there are 2 approaches

    -> If you read the file using ZipFile class the values come out correctly

    -> If you use ZipInputStream then these values are computed only after you have read all the bytes from the entry. more info here

     if(!entry.isDirectory())
     {
      try(ByteArrayOutputStream baos=new ByteArrayOutputStream())
      {
       int read;
       byte[] data=new byte[10];    
       while((read=zipInputStream.read(data))>0){baos.write(data,0,read);}
       System.out.println("Data="+new String(baos.toByteArray()));
      } 
     }
     //Now these values are correct
     System.out.println("CRC="+entry.getCrc());
     System.out.println("Real Size="+entry.getSize());
     System.out.println("Compressed Size="+entry.getCompressedSize());
    

    Problem 2: Incorrect Extra data

    This post pretty much explains everything

    Here is the code

         ByteBuffer extraData = ByteBuffer.wrap(entry.getExtra()).order(ByteOrder.LITTLE_ENDIAN);
         while(extraData.hasRemaining()) 
         {
           int id = extraData.getShort() & 0xffff;
           int length = extraData.getShort() & 0xffff;
    
           if(id == 0x756e) 
           {
             int crc32 = extraData.getInt();
             short permissions = extraData.getShort();
             int 
             linkLengthOrDeviceNumbers = extraData.getInt(),
             userID = extraData.getChar(),
             groupID = extraData.getChar();
    
             ByteBuffer linkDestBuffer = extraData.slice().limit(length - 14);
             String linkDestination=StandardCharsets.UTF_8.decode(linkDestBuffer).toString();
           } 
           else
           {
            extraData.position(extraData.position() + length);        
            byte[] ourData=new byte[extraData.remaining()];
            extraData.get(ourData);
    
            //do stuff
           }
         } 
    

    Unsolved Problems

    There are still 3 values which return different results based on which method you use to read the file. I made a table of my observations per entry

                                ZipFile           ZipInputStream
     getCreationTime()           null             <correct value>
    
     getLastAccessTime()         null             <correct value>
    
     getComment()             <correct value>        null
    

    Apparently from the bug report This is expected behavior since zip file is random access and zip input stream is sequential and so they access data differently.

    From my observations Using ZipInputStream returns the best results so i will continue to use that