Some daemon implemented in Java, running on Windows 7, copies files from one directory into another, while both source and target directory are a network share hosted by Windows Server 2016. Copying is done using Apache Commons IO and occasionally it happens that this process fails with the following stacktrace and a message reading somewhat like "no more files":
java.io.IOException: Es sind keine weiteren Dateien vorhanden
at java.io.WinNTFileSystem.canonicalize0(Native Method)
at java.io.WinNTFileSystem.canonicalize(Unknown Source)
at java.io.File.getCanonicalPath(Unknown Source)
at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:642)
at org.apache.commons.io.FileUtils.copyFileToDirectory(FileUtils.java:587)
at org.apache.commons.io.FileUtils.copyFileToDirectory(FileUtils.java:558)
at de.am_soft.osgi.dokliste.eingaenge.impl.internal.Eingang.copyFilesToDbxmlFolders(Eingang.java:283)
Apache Commons IO uses the following code at line 642 and the line really only is the following if
, not the exception:
if (srcFile.getCanonicalPath().equals(destFile.getCanonicalPath())) {
throw new IOException("Source '" + srcFile + "' and destination '" + destFile + "' are the same");
}
So the problem is not with copying itself, but with generating canonical paths already. Using Process Monitor1 at the client where the daemon runs proves that as well. The following is the last event before the daemon clearly logs the above exception, tries to send error mails using Logback and stuff. The result of that event (NO MORE FILES
) perfectly well fits to the error message of the stacktrace:
10:12:06,6244515 integration.exe 6928 QueryDirectory \\HOST\SHARE$\DocBeam3\[...].zip NO MORE FILES Filter: 20191106-081920-[...].zip
Additionally, looking at former lines of ProcMon, it's sure that the exception happens for destFile
only. Executing the daemon on my local machine instead leads to the following logged event (NO SUCH FILE
) always:
19:08:03,7485947 java.exe 6232 QueryDirectory C:\Users\[...].zip NO SUCH FILE Filter: 20191022-143101-[...].zip
I've debugged the native methods and came across lastErrorReportable
, which explicitly checks for some special error codes and doesn't contain ERROR_NO_MORE_FILES
from the first event, while it does contain ERROR_FILE_NOT_FOUND
from the second one:
if ((errval == ERROR_FILE_NOT_FOUND)
|| (errval == ERROR_DIRECTORY)
|| (errval == ERROR_PATH_NOT_FOUND)
|| (errval == ERROR_BAD_NETPATH)
|| (errval == ERROR_BAD_NET_NAME)
|| (errval == ERROR_ACCESS_DENIED)
|| (errval == ERROR_NETWORK_UNREACHABLE)
|| (errval == ERROR_NETWORK_ACCESS_DENIED)) {
return 0;
}
So it seems like whenever ERROR_NO_MORE_FILES
occurs, canonicalizing a path simply gets aborted with an error instead of ignoring it like for the other errors:
if (!lastErrorReportable()) {
if (!(dst = wcp(dst, dend, L'\0', src, src + wcslen(src)))){
goto err;
}
break;
} else {
goto err;
}
The thrown exception fits pretty well to what I get, with the given message only being a fallback not used in my case:
if (rv == NULL && !(*env)->ExceptionCheck(env)) {
JNU_ThrowIOExceptionWithLastError(env, "Bad pathname");
}
The interesting thing now is that the daemon doesn't fail always on each and every file copy, but only sometimes, somewhat rarely. But if it fails it seems to have to do with other directories and files being available in the target directory already. While those are completely unrelated to the daemon and according to ProcMon those don't get iterated or stuff, their pure existance seems to make a difference already. If I simply delete all of those files and directories and empty the target directory this way, copying instantly succeeds again. That's interesting because having files and directories in the target directory in my local setup doesn't seem to have any influence: Copying never fails and especially the event logged by ProcMon NEVER is ERROR_NO_MORE_FILES
as well. After emptying the directory on the setup where the problem happens, ProcMon logs ERROR_FILE_NOT_FOUND
again as well.
So it seems that for some reason under some currently unknown circumstances, Windows decides to use ERROR_NO_MORE_FILES
as last error in the calls to FindFirstFileW
used by wcanonicalize
. Because Java doesn't have that on its exception list, copying fails in those circumstances, even if it seems to be a perfectly valid situation. I don't see any real error otherwise.
So should ERROR_NO_MORE_FILES
be added to lastErrorReportable
? And if so, who do I need to ask for actually? :-)
This behavior is caused by an SMB incompatibility between Windows Server 2019 server (file server) and previous versions of Windows (clients). The cache of directory metadata is handled differently which causes this issue when reading a share with many files and folders.
Microsoft has unfortunately not yet released a fix for this bug.
A workaround is to disable the SMB metadata caching on the client side with this registry setting:
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DirectoryCacheLifetime=0 (DWORD)