javacompressiongzipoutputstream

Why my compressed output using GZIPOutputStream is not matching when we decompressed the same string using any online compression tools


I am using below code to compress my test string into gzip format

'private static final String TEST_STRING = "Test String";

public static void main(String[] args) throws IOException {

    System.out.println("Original data: " + TEST_STRING);
    String compressedString = compress(TEST_STRING);
    System.out.println("Compressed data: " + compressedString);
}


private static String compress(String str){
    byte[] byteArray = str.getBytes();
    final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    try (GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream)) {
        gzipOutputStream.write(byteArray);

    } catch (IOException e) {
        System.out.println("Exception occurred during compressing" + e);
    }

     return new String(byteArrayOutputStream.toByteArray());
}'

Output I am getting is Compressed data: � I-..)��K ��w�
BUT when I try to compress the same test string using any online gzip converter its giving me: eJwLSS0uUQguKcrMSwcAGKAEOA==

I am not sure why this difference is coming?


Solution

  • You are confusing both binary with Base-64, and gzip with zlib.

    First, compressed data makes use of all possible byte values, so it won't be printable. You also can't convert it to a string using String without losing information. Many channels can't handle all possible byte values, so in those cases, the compressed data is then expanded some to encode it to use only printable characters. The most common is Base64 encoding, which is the example you are showing.

    Second, that particular example, eJwLSS0uUQguKcrMSwcAGKAEOA==, is a Base64 encoding of a zlib stream, not a gzip stream. If you need a gzip stream, you generated that using the wrong tool. A Base64 gzip stream is immediately recognizable, as it will start with H4sI. For your test string, gzip compressed and Base64 encoded, you should get something like:

    H4sIAJGKPWQAAwtJLS5RCC4pysxLBwCkn3eSCwAAAA==