I am using Kafka 0.8.2.2 and am trying to set up compression. I am providing the compression-codec (gzip) as an argument to the console producer like below.
./kafka-console-producer.sh --broker-list localhost:171 --compression-codec gzip --topic testTopic
Questions Is this the only place where I need to specify compression? How do I verify if compression is indeed taking place? How do I quantify the benefit I am getting from compression? What files (.index, .log) I should look for and compare the sizes with and without compression to estimate the benefit?
How to verify if compression is happening?
Use DumpLogSegments
tool and substitute your dir location / log file name (default log.dir
is /tmp/kafka-logs
)
bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /your_kafka_logs_dir/your_topic-your_partition/00000000000000000000.log --print-data-log | grep compresscodec
You will see something like below:
baseOffset: 0 lastOffset: 0 count: 1 ... compresscodec: NONE ...
baseOffset: 1 lastOffset: 1 count: 1 ... compresscodec: GZIP ...
baseOffset: 2 lastOffset: 2 count: 1 ... compresscodec: SNAPPY ...
baseOffset: 3 lastOffset: 3 count: 1 ... compresscodec: LZ4 ...
More info can be found in documentation here https://kafka.apache.org/documentation/#design_compression