linuxdiskspacedu

du counting hardlinks towards filesize?


I have a backup system that creates directories named after Unix Timestamps, and then creates incremental backups using a hardlink system (--link-dest in rsync), so typically the first backup is very large, and then later backups are fractions as big.

This is my output of my current backups:

root@athos:/media/awesomeness_drive# du -sh lantea_home/*
31G lantea_home/1384197192
17M lantea_home/1384205953
17M lantea_home/1384205979
17M lantea_home/1384206056
17M lantea_home/1384206195
17M lantea_home/1384207349
3.1G    lantea_home/1384207678
14M lantea_home/1384208111
14M lantea_home/1384208128
16M lantea_home/1384232401
15G lantea_home/1384275601
43M lantea_home/1384318801

Everything seems correct, however, take for example the last directory, lantea_home/1384318801:

root@athos:/media/awesomeness_drive# du -sh lantea_home/1384318801/
28G lantea_home/1384318801/

I consistently get this behavior, why is the directory considered 28G by the second du command?

Note - the output remains the same with the -P and -L flags.


Solution

  • Hardlinks are real references to the same file (represented by its inode). There is no difference between the "original" file and a hard link pointing to it as well. Both files have the same status, both are then references to this file. Removing one of them lets the other stay intact. Only removing the last hardlink will remove the file at last and free the disk space.

    So if you ask du what it sees in one directory only, it does not care that there are hardlinks elsewhere pointing to the same contents. It simply counts all the files' sizes and sums them up. Only hardlinks within the considered directory are not counted more than once. du is that clever (not all programs necessarily need to be).

    So in effect, directory A might have a du size of 28G, directory B might have a size of 29G, but together they still only occupy 30G and if you ask du of the size of A and B, you will get that number.

    Disc usage by several directories when hardlinks are involved.