debianzfs

ZFS: How to reduce frequency of or prevent txg_sync


I have a small home sever running with Debian Buster where I have a ZFS filesystem (ZFS: Loaded module v0.7.12-2+deb10u2, ZFS pool version 5000, ZFS filesystem version 5) with a RAID.

As the server is sometimes not used for days I have configured a autoshutdown script which shuts down the server if my 2 big WD red hard disks are in standby for more than 45 minutes (not the system hard disk). Now I figured out that the server is not shutting down anymore as both drives are only a few minutes in standby before getting active again. I tested with iotop and figured out that ZFS with the command txg_sync is waking them up. Even if no other process is writing or reading anything on the drives.

I did also a check with fatrace -c after changing to the directory where the datapool is mounted. There is no output at the time as the command txg_sync pops up and wakes the drives. Update: As it seems that fatrace is not working properly with ZFS.

I now used iosnoopfrom and now know that dm_crypt is writing on my disks regularly. My underlying drives are encrypted with LUKS.

./iosnoop -d 8,16
Tracing block I/O. Ctrl-C to end.
COMM         PID    TYPE DEV      BLOCK        BYTES     LATms
dmcrypt_writ 1895   W    8,16     2080476248   4096    6516.10
dmcrypt_writ 1895   W    8,16     3334728264   4096    6516.14
dmcrypt_writ 1895   W    8,16     2080429048   16384      0.16
dmcrypt_writ 1895   W    8,16     3334728272   20480      0.21
dmcrypt_writ 1895   W    8,16     2080476256   20480      0.16
dmcrypt_writ 1895   W    8,16     3328225336   16384      0.20

What is the reason for that and how can I prevent this occuring?


Solution

  • https://github.com/openzfs/zfs/issues/8537#issuecomment-477361010

    @niksfirefly if the pool is being written to then you should expect to see cpu and I/O by consumed by the txg_sync thread. How much will depend on your specific hardware, the pool configuration, which features/properties are enabled, and your workload. This may be normal for your circumstances.

    And maybe this link is helpful too: https://serverfault.com/questions/661336/slow-performance-due-to-txg-sync-for-zfs-0-6-3-on-ubuntu-14-04

    How to check disk I/O utilization per process:

    cut -d" " -f 1,2,42 /proc/*/stat | sort -n -k +3
    

    Those fields are PID, command and cumulative IO-wait ticks. This will show your hot processes, though only if they are still running. (You probably want to ignore your filesystem journalling threads.)

    (from https://serverfault.com/a/466342/580935)