We are upgrading our application from RHEL6.5 ext4 to RHEL7.3 XFS. We have observed that with XFS file system doing a cold reboot (from system console - iLO) truncates some of our files (that are being written to disks every few seconds) to zero bytes. Not only our application, but let us say we redirect output from one command to a file using ">" those file disappear after the cold reboot. We are aware of the recommendation about explicity doing a fysnc. But what our code that is in Java? What about the cases from our python scripts?
Now we are in dilemma whether to stick with ext4 or XFS. XFS has advantages and would be our first preference. And we cannot believe that the rest of the world is not aware of this. Either this is something specific with RHEL (we see one similar issue fixed in RHLE6.5 https://bugzilla.redhat.com/show_bug.cgi?id=845233) or this is expected behavior from modern FileSystems?
Both Java and Python provide access to the fsync
operation in their I/O libraries, so this is not really an excuse, but I understand what you mean.
However, the key ext4/XFS difference in this area is typically something else, though. A straight
echo contents > file
will sometimes leave behind a zero-length file even with ext4, particularly if the contents written is more than just a few bytes. What is guaranteed to work on ext4 (with a default configuration) is this:
echo contents > file.new
mv file.new file
With ext4-with-defaults, this will never leave behind a partially written file
(only file.new
might be incomplete). XFS is different in this regard, there the contents needs to be fsync
ed before the rename.
In 2014, Eric Sandeen proposed a patch to align the XFS behavior with what ext4 does, but it was not well-received at the time and it was not merged. Maybe the tides have turned since then and a reproposed patch would be acceptable today. (I don't see a flush in the current code, but I'm not an XFS developer.)
If this blocks your migration to XFS, you should absolutely file a support ticket. Even though it's really not an option to deviate from the upstream kernel for this, such customer requires are always important feedback.