ccachinglinux-kernelext4page-caching

Is it a safe to use Direct-IO write and Page Cache read at the same time?


For instance, open a file twice, direct-io writes with one fd, and page cache reads with the other?

How to define safe: Write some data from direct-io fd and then expect to read them immediately from page-cache fd


Solution

  • I think directIO write to file should be rather safe for later cached reads on this file, but the read may have lower performance (written data was not save in page cache and must be read from real storage). But exact code path may depend on the filesystem used.

    This post https://lwn.net/Articles/776801/ mentions that direct IO has invalidation semantics:

    with some filesystems at least, performing a direct-I/O read on a page will force that page out of the cache

    The book lists 3 strategies for writing in "Write Caching" section: no-write, write-through, write-back. Direct I/O may be "no-write" variation of write() syscall.

    Using of several fd for single file is safe as the data is managed by FS code using inode. Both fd will point to the same inode.

    In 2013 there was a thread in mailing list https://lists.kernelnewbies.org/pipermail/kernelnewbies/2013-July/008660.html and TLDR is:

    From a kernel developer's perspective : The kernel driver guarantees coherency between then page-cache and data transferred using O_DIRECT. ...

    1. Do not worry about coherency between the page-cache and the data transferred using O_DIRECT. The kernel will invalidate the cache after an O_DIRECT write and flush the cache before an O_DIRECT read.
    2. Use mutexes or semaphores(or any of the numerous options [1]) to prevent the usual synchronisation problems during IPC using a shared file.

    So while direct write will clear written part of file from page cache, there is some possibility of race between writer and reader. So mutex or other sync is needed if your reader wants to get updated data. Only after direct IO write() syscall ends page cache will be cleared.

    Sometimes mixing is not recommended: https://medium.com/databasss/on-disk-io-part-1-flavours-of-io-8e1ace1de017 "It is discouraged to open the same file with Direct IO and Page Cache simultaneously, since direct operations will be performed against disk device even if the data is in Page Cache, which may lead to undesired results."