I'm testing a Kafka producer application and noticed something strange about the disk usage of the Kafka logs. When looking at the total size of a certain partition's log directory, while the application is writing to Kafka, I see this:
$ ls -l --block-size=kB kafka-logs/mytopic-0
total 52311kB
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.index
-rw-rw-r-- 1 app-data app-data 46505kB Oct 29 12:45 00000000000000000000.log
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.timeindex
-rw-rw-r-- 1 app-data app-data 1kB Oct 29 11:55 leader-epoch-checkpoint
Then I stop my application, and a few minutes later I repeat the above command, and get this:
$ ls -l --block-size=kB kafka-logs/mytopic-0
total 46519kB
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.index
-rw-rw-r-- 1 app-data app-data 46505kB Oct 29 12:45 00000000000000000000.log
-rw-rw-r-- 1 app-data app-data 10486kB Oct 29 12:45 00000000000000000000.timeindex
-rw-rw-r-- 1 app-data app-data 1kB Oct 29 11:55 leader-epoch-checkpoint
Questions: Why does the ls
total figure not represent the sum of sizes of all the files in that directory? Why does the total decrease a few minutes after stopping the producer application, even though all the files in the directory remain the same size?
The files might have holes. Can you run following commands :
du --apparent-size *