I need to walk a directory on a network drive and create a map of child to parent in the hierarchy. One representative directory is 6 Terrabytes, has 900,000 files and 900 folders. I only care about the folders and not the files. For testing purposes I copied the folders without files to another network drive and ran my code on the copied version. Just iterating over the 900 folders takes maybe 10 seconds. However iterating over the original directory structure takes 30 minutes. It appears that we are iterating through all 900,000 files even though we are just ignoring them.
Is there a way to speed this up by not even looking at the files? I would prefer to stick with pure Java if we can. When browsing this huge directory through Windows Explorer, it does not feel slow at all. My code is below.
public static Map<String, String> findFolderPaths(File parentFolder) throws IOException {
Map<String, String> parentFolderMap = new HashMap<String, String>();
Files.walkFileTree(parentFolder.toPath(), new FolderMappingFileVisitor(parentFolderMap));
return parentFolderMap;
}
static class FolderMappingFileVisitor extends SimpleFileVisitor<Path> {
private Map<String, String> mapping;
FolderMappingFileVisitor(Map<String, String> map) {
this.mapping = map;
}
@Override
public FileVisitResult preVisitDirectory(Path dir,
BasicFileAttributes attrs) throws IOException {
File directory = dir.toFile();
mapping.put(directory.getName(), directory.getParent());
return FileVisitResult.CONTINUE;
}
}
Edit:
An important piece of the puzzle that I did not mention is that we are running the app in webstart. The times I reported were from production, not development. Running from Eclipse, the times are more what I would expect for the FileWalker.
The file walker appears to be working much faster than File.listFiles(). The problem appears to be Java Webstart. When I run the app in production under Java Webstart, it takes around 30 minutes. When I run the app from Eclipse, it takes a couple of minutes. Java Webstart is just killing us performance-wise.
This app is a very data/io intensive app, and I have noticed other issues in the past with this app when running under Webstart. The solution is to migrate away from Java Webstart.