linuxapache-kafkalskafka-partition

Total size of files in Kafka logs directory is less than the sum of their sizes


I'm testing a Kafka producer application and noticed something strange about the disk usage of the Kafka logs. When looking at the total size of a certain partition's log directory, while the application is writing to Kafka, I see this:

$ ls -l --block-size=kB kafka-logs/mytopic-0
total 52311kB
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.index
-rw-rw-r-- 1 app-data app-data 46505kB Oct 29 12:45 00000000000000000000.log
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.timeindex
-rw-rw-r-- 1 app-data app-data     1kB Oct 29 11:55 leader-epoch-checkpoint

Then I stop my application, and a few minutes later I repeat the above command, and get this:

$ ls -l --block-size=kB kafka-logs/mytopic-0
total 46519kB
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.index
-rw-rw-r-- 1 app-data app-data 46505kB Oct 29 12:45 00000000000000000000.log
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.timeindex
-rw-rw-r-- 1 app-data app-data     1kB Oct 29 11:55 leader-epoch-checkpoint

Questions: Why does the ls total figure not represent the sum of sizes of all the files in that directory? Why does the total decrease a few minutes after stopping the producer application, even though all the files in the directory remain the same size?


Solution

  • The files might have holes. Can you run following commands :

    du --apparent-size *