I am creating a zip file with one directory and an single compressed text file inside of it.
Code to create the zip file
try(ZipOutputStream zos=new ZipOutputStream(new FileOutputStream("E:/TestFile.zip")))
{
//comment,level,method for all entries
zos.setComment("Test Zip File");
zos.setLevel(Deflater.BEST_COMPRESSION);
zos.setMethod(Deflater.DEFLATED);
//Creating Directories[ends with a forward slash]
{
ZipEntry dir1=new ZipEntry("Directory/");
//Give it a comment
dir1.setComment("Directory");
//Some extra data
dir1.setExtra("Hello".getBytes());
//Set Creation,Access,Modification Time
FileTime time=FileTime.fromMillis(System.currentTimeMillis());
dir1.setCreationTime(time);
dir1.setLastAccessTime(time);
dir1.setLastModifiedTime(time);
//put the entry & close it
zos.putNextEntry(dir1);
zos.closeEntry();
}
//Creating an fully compressed file inside the directory with all informtion
{
ZipEntry file=new ZipEntry("Directory/Test.txt");
//Meta Data
{
//Give it a comment
file.setComment("A File");
//Some extra data
file.setExtra("World".getBytes());
//Set Creation,Access,Modification Time
FileTime time=FileTime.fromMillis(System.currentTimeMillis());
file.setCreationTime(time);
file.setLastAccessTime(time);
file.setLastModifiedTime(time);
}
//Byte Data
{
//put entry for writing
zos.putNextEntry(file);
byte[] data="Hello World Hello World".getBytes();
//Compress Data
Deflater deflater=new Deflater(9);
deflater.setDictionary("Hello World ".getBytes());
deflater.setInput(data);
deflater.finish();
byte[] output=new byte[100];
int compressed=deflater.deflate(output);
//Write Data
CRC32 check=new CRC32();
check.update(data);
file.setSize(deflater.getBytesRead());
file.setCrc(check.getValue());
file.setCompressedSize(compressed);
zos.write(output,0,compressed);
//end data
System.out.println(deflater.getBytesRead()+"/"+compressed);
deflater.end();
}
//close the entry
zos.closeEntry();
}
}
}
Upon writing the file the size of the byte data uncompressed is 23 bytes and the size of the data compressed is 15. I am using every method inside ZipEntry just to test if i can retrive all the values correctly upon reading it.
Upon Reading it using ZipFile class & not ZipInputStream(bug getSize() always returns -1) using this code
//reading zip file using ZipFile
public static void main(String[] args)throws Exception
{
try(ZipFile zis=new ZipFile("E:/TestFile.zip"))
{
Enumeration<? extends ZipEntry> entries=zis.entries();
while(entries.hasMoreElements())
{
ZipEntry entry=entries.nextElement();
System.out.println("Name="+entry.getName());
System.out.println("Is Directory="+entry.isDirectory());
System.out.println("Comment="+entry.getComment());
System.out.println("Creation Time="+entry.getCreationTime());
System.out.println("Access Time="+entry.getLastAccessTime());
System.out.println("Modification Time="+entry.getLastModifiedTime());
System.out.println("CRC="+entry.getCrc());
System.out.println("Real Size="+entry.getSize());
System.out.println("Compressed Size="+entry.getCompressedSize());
System.out.println("Optional Data="+new String(entry.getExtra()));
System.out.println("Method="+entry.getMethod());
if(!entry.isDirectory())
{
Inflater inflater=new Inflater();
try(InputStream is=zis.getInputStream(entry))
{
byte[] originalData=new byte[(int)entry.getSize()];
inflater.setInput(is.readAllBytes());
int realLength=inflater.inflate(originalData);
if(inflater.needsDictionary())
{
inflater.setDictionary("Hello World ".getBytes());
realLength=inflater.inflate(originalData);
}
inflater.end();
System.out.println("Data="+new String(originalData,0,realLength));
}
}
System.out.println("=====================================================");
}
}
}
I get this output
Name=Directory/
Is Directory=true
Comment=Directory
Creation Time=null
Access Time=null
Modification Time=2022-01-24T17:00:25Z
CRC=0
Real Size=0
Compressed Size=2
Optional Data=UTaHello
Method=8
=====================================================
Name=Directory/Test.txt
Is Directory=false
Comment=A File
Creation Time=null
Access Time=null
Modification Time=2022-01-24T17:00:25Z
CRC=2483042136
Real Size=15
Compressed Size=17
Optional Data=UT��aWorld
Method=8
Data=Hello World Hel
==================================================
There is a lot of wrong output in this code
For the directory
1)Creation Time & Access Time are null[even though i have specified it in the write method]
2)Extra Data[Optional Data] has wrong encoding
For the file
1)Creation Time & Access Time are null[even though i have specified it in the write method]
2)getSize() & getCompressedSize() methods return the wrong values. I have specified these values during writing manually with sizeSize() & setCompressedSize() when creating the file the values were 23 and 15 but it returns 15 and 17
3)Extra Data[Optional Data] has wrong encoding
4)Since getSize() returns incorrect size it dosen't display the whole data[Hello World Hel]
With so many things going wrong i thought to post this as one question rather than multiple small ones as they all seem related. I am a complete beginner in writing zip files so any direction on where do i go from here would be greatly appreciated.
I can read the data of an zip entry using an while loop into an buffer if the size is not known or incorrect which is not an problem but why would they even create an set or get size method if they knew we would be doing this most of the time anyway. Whats the point?
After much research i was able to solve 70% of the problems. Others can't be solved given the nature of how an ZipOutputStream & ZipFile reads the data
Problem 1: Incorrect values returned by getSize() & getCompressedSize()
1) During Writing
I was blind to have not seen this earlier but ZipOutputStream already does compression for us and i was double compressing it by using my own inflater so i removed that code and i realized that you must specify these values only when you are using the method as STORED. else they are computed for you from the data. So refracting my zip writing code this is how it looks like
try(ZipOutputStream zos=new ZipOutputStream(new FileOutputStream("E:/TestFile2.zip")))
{
//comment,level,method for all entries
zos.setComment("Test Zip File");
//Auto Compression
zos.setMethod(ZipOutputStream.DEFLATED);
zos.setLevel(9);
//Creating Directories[ends with a forward slash]
{
ZipEntry dir1=new ZipEntry("Directory/");
//Give it a comment
dir1.setComment("Directory");
//Some extra data
dir1.setExtra("Hello".getBytes());
//Set Creation,Access,Modification Time
FileTime time=FileTime.fromMillis(System.currentTimeMillis());
dir1.setCreationTime(time);
dir1.setLastAccessTime(time);
dir1.setLastModifiedTime(time);
//put the entry & close it
zos.putNextEntry(dir1);
zos.closeEntry();
}
//Creating an fully compressed file inside the directory with all informtion
{
ZipEntry file=new ZipEntry("Directory/Test.txt");
//Meta Data
{
//Give it a comment
file.setComment("A File");
//Some extra data
file.setExtra("World".getBytes());
//Set Creation,Access,Modification Time
FileTime time=FileTime.fromMillis(System.currentTimeMillis());
file.setCreationTime(time);
file.setLastAccessTime(time);
file.setLastModifiedTime(time);
}
//Byte Data
{
byte[] data="Hello World Hello World".getBytes();
//Data
zos.putNextEntry(file);
zos.write(data);
zos.flush();
}
//close the entry
zos.closeEntry();
}
//finish writing the zip file without closing stream
zos.finish();
}
2)During Reading
To get the correct size & compressed size values there are 2 approaches
-> If you read the file using ZipFile class the values come out correctly
-> If you use ZipInputStream then these values are computed only after you have read all the bytes from the entry. more info here
if(!entry.isDirectory())
{
try(ByteArrayOutputStream baos=new ByteArrayOutputStream())
{
int read;
byte[] data=new byte[10];
while((read=zipInputStream.read(data))>0){baos.write(data,0,read);}
System.out.println("Data="+new String(baos.toByteArray()));
}
}
//Now these values are correct
System.out.println("CRC="+entry.getCrc());
System.out.println("Real Size="+entry.getSize());
System.out.println("Compressed Size="+entry.getCompressedSize());
Problem 2: Incorrect Extra data
This post pretty much explains everything
Here is the code
ByteBuffer extraData = ByteBuffer.wrap(entry.getExtra()).order(ByteOrder.LITTLE_ENDIAN);
while(extraData.hasRemaining())
{
int id = extraData.getShort() & 0xffff;
int length = extraData.getShort() & 0xffff;
if(id == 0x756e)
{
int crc32 = extraData.getInt();
short permissions = extraData.getShort();
int
linkLengthOrDeviceNumbers = extraData.getInt(),
userID = extraData.getChar(),
groupID = extraData.getChar();
ByteBuffer linkDestBuffer = extraData.slice().limit(length - 14);
String linkDestination=StandardCharsets.UTF_8.decode(linkDestBuffer).toString();
}
else
{
extraData.position(extraData.position() + length);
byte[] ourData=new byte[extraData.remaining()];
extraData.get(ourData);
//do stuff
}
}
Unsolved Problems
There are still 3 values which return different results based on which method you use to read the file. I made a table of my observations per entry
ZipFile ZipInputStream
getCreationTime() null <correct value>
getLastAccessTime() null <correct value>
getComment() <correct value> null
Apparently from the bug report This is expected behavior since zip file is random access and zip input stream is sequential and so they access data differently.
From my observations Using ZipInputStream returns the best results so i will continue to use that