gitstatapfs

Why is performance of lstat poor on APFS (OSX) compared to Ext4 (Linux)


While profiling Git on a large repository, I found that git status is significantly (10x) slower on mac compared to linux. git status runs lstat on every file in the repository, which is where the slowness comes from.

Is there a particular reason why this syscall is much slower on macs compared to linux?


Solution

  • Check first you version of Git, as there have been improvements in recent Git version (like git add in 2.20, git stash in 2.22 and even the upcoming 2.27 with submodules).

    Even git status improved with Git 2.24

    The feature.manyFiles setting is suitable for repos with many files in the working directory.
    By setting index.version=4 and core.untrackedCache=true, commands such as 'git status' should improve.


    One analysis of the difference of lstats in APFS compared to Ext4 was done in "Global Kernel Locks in APFS" by Gregory Szorc:

    It is apparent that macOS 10.14 Mojave has received performance work relative to macOS 10.13!
    Despite those improvements, APFS is still spending a lot of CPU time in the kernel. And the kernel CPU time is still comparatively very high compared to Linux/EXT4, even for single process operation.

    While the source code for APFS is not available for me to confirm, the profiling results showing excessive time spent in lck_mtx_lock_grab_mutex() combined with the fact that execution time decreases when the parallel process count decreases leads me to the conclusion that APFS obtains a global kernel lock during read-only operations such as readdir().

    In other words, APFS slows down when attempting to perform parallel read-only I/O.