I am using Java's ZipOutputStream class to write a large zip file.
It works fine when the zip file only has 1000 subfiles and subfolders. It also works fine when the zip file only has 10000 subfiles and subfolders.
But, for some reason, when I ramp it up to more than 100000 subfiles and subfolders, a problem starts happening.
It will still write a large chunk of the subfiles, but then it will quit. I end up with a zip file that contains about half of the directory tree I was expecting.
The zip file is NOT hitting the roof size of 4GB. After finishing, the zip file is only about 30MB large overall. Each subfile within the zip file is less than 1KB in size.
The layout of the zip file is simple:
The root folder should hold ten subdirectories.
Each of those subdirectories should hold ten subdirectories.
This continues until the fourth level, where each subdirectory holds ten files.
But after execution, the first level only holds 5 directories, and the last directory of the first level only holds two subdirectories.
Here is the relevant code:
public class MyZipFile{
MyFolder folder;
public void create(String[] haystack){
folder = new MyFolder();
folder.create(haystack);
}
public void writeToDisk(String rootPath){
FileOutputStream rootFile;
BufferedOutputStream rootBuffer;
ZipOutputStream rootZip;
try{
rootFile = new FileOutputStream(rootPath);
rootBuffer = new BufferedOutputStream(rootFile);
rootZip = new ZipOutputStream(rootBuffer);
folder.writeToDisk(rootZip);
rootZip.close();
}
}
}
And
public class MyKeyFolder extends MyFileSystemElement {
private final int numSubfiles = 10;
private MyFileSystemElement[] subfiles;
public MyFolder(int[] catalogNumber){
super(catalogNumber);
subfiles = new MyFileSystemElement[numSubfiles];
}
@Override
public void create(String[] haystack){
for(int i=0; i < numSubfiles; i++){
MyFileSystemElement element;
int[] subCatalogNumber = createSubCatalogNumber(i);
//finalFolderLevel determines how many files are in the ZIP
//When finalFolderLevel is 2, there are about 1000 subfiles.
//When finalFolderLevel is 3, there are about 10000 subfiles.
//When finalFolderLevel is 4, there are about 100000 subfiles
//(and this is where I run into trouble).
if(thisFolderLevel <= finalFolderLevel)
element = new myFolder(subCatalogNumber);
else
element = new myFile(subCatalogNumber);
subfiles[i] = element;
subfiles[i].create(haystack);
}
}
@Override
public void writeToDisk(ZipOutputStream zipStream) throws IOException {
String path = createPathFromCatalogNumber();
zipStream.putNextEntry(new ZipEntry(path));
zipStream.flush();
zipStream.closeEntry();
for(MyFileSystemElement file : subfiles){
file.writeToDisk(zipStream);
}
}
}
And
public class MyFile extends MyFileSystemElement {
private String fileContents;
@Override
public void create(String[] haystack){
fileContents = "";
fileContents += haystack[deriveIndex(0)] + "\n";
fileContents += haystack[deriveIndex(1)] + "\n";
fileContents += haystack[deriveIndex(2)] + "\n";
fileContents += haystack[deriveIndex(3)] + "\n";
fileContents += haystack[deriveIndex(4)] + "\n";
fileContents += haystack[deriveIndex(5)] + "\n";
fileContents += haystack[deriveIndex(6)] + "\n";
fileContents += haystack[deriveIndex(7)] + "\n";
fileContents += haystack[deriveIndex(8)] + "\n";
fileContents += haystack[deriveIndex(9)] + "\n";
}
@Override
public void writeToDisk(ZipOutputStream zipStream) throws IOException {
String path = createPathFromCatalogNumber();
ZipEntry entry = new ZipEntry(path);
byte[] contents = fileContents.getBytes();
zipStream.putNextEntry(entry);
zipStream.write(contents);
zipStream.flush();
zipStream.closeEntry();
}
}
I'm wondering if ZipOutputStream has a maximum number of files it can write before it just quits. Or maybe FileOutputStream does. I'm not sure.
Your help is appreciated. Thank you.
The original INFO-ZIP format (which Java implements) appears to be limited to 65,536 entries:
http://www.info-zip.org/FAQ.html#limits
Here's another angle on that limit from WinZIP relative to their ZIP64 format:
http://kb.winzip.com/kb/entry/99/
You may want to use a different archive format or figure out a way to limit the file count.