javagzipgzipinputstreamgzipoutputstream

GZIP eats newlines


I have the following code for compressing and decompressing string.

public static byte[] compress(String str)
{
    try
    {
        ByteArrayOutputStream obj = new ByteArrayOutputStream();
        GZIPOutputStream gzip = new GZIPOutputStream(obj);
        gzip.write(str.getBytes("UTF-8"));
        gzip.close();
        return obj.toByteArray();
    }
    catch (IOException e)
    {
        e.printStackTrace();
    }
    return null;
}

public static String decompress(byte[] bytes)
{
    try
    {
        GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));
        BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "UTF-8"));
        StringBuilder outStr = new StringBuilder();
        String line;
        while ((line = bf.readLine()) != null)
        {
            outStr.append(line);
        }
        return outStr.toString();
    }
    catch (IOException e)
    {
        return e.getMessage();
    }
}

I compress into byte array on windows, and then send the byte array through socket to the linux and uncompress it there. However upon uncompression it seem that all my newline characters are gone.
So I thought that the problem was linux to windows relationship. However I have tried writing a simple program on windows that uses it, and found that the newlines are still gone.
Can anyone shed any light as to what causes it? I can't figure out any explanation.


Solution

  • I think the problem is here:

    while ((line = bf.readLine()) != null)
        {
            outStr.append(line);
        }
    

    The readLine see's the newline char but doesn't include it in the returned value for line

    The problem is worse than you think, perhaps.

    readLine() gets all the characters up to, but not including, a newline (or some variety of returns and linefeed characters) OR the end of file. So you don't know if the last line you get had a newline on the end or not.

    This might not matter, and if so, you can just add this following the other append:

    outStr.append('\n');
    

    Some files might end up with an extra line ending at the end of file.

    If it does matter, you will need to use read() and then output all the characters you receive. In that case, you might end up with the infamous "What's at the end of the line?" problem you allude to between Windows, Linux and the MacOS and the way they use different combinations of return and new-line characters to end lines.