javastringcompressiongzipgzipoutputstream

Java - Compress large string


In my Java application I get from some calculations a really long string (from really long string I mean of around 600000 characters or so). But I need to send this string to a client to process it, and for this reason I need the compressed string to be of maximum 1000 characters.

I have tried using GZIPOutputStream and Inflater and Deflater classes, and in the best case I got an output string of 300000 characters, which is great compression, but in my case it's not enough.

I have also tried compressing the string n times, but the ouput was larger than the previous one, so only one-time compressing was successful.

So, what do you suggest me to try?

Thank you.


Solution

  • I agree with @Peter Lawrey that, strictly with those requirements, it might be impossible to deliver such a big message to the client.

    Anyway, I still suggest three possible solutions, depending on how flexible your requirements are:

    1. If all of your input strings have a restricted vocabulary (it does not allow a free, random combination of letters, symbols and numbers, but it is restricted to a certain set of business words, identifiers and values), and also a simple grammar, you can try to design your own compress algorithm. Example:

    input symbol compressed symbol ------------ ----------------- client 1 bill 2 date 3 amount 4 value 5 price 6 tax 7

    If the grammar is simple but the vocabulary is not that restricted, you could perform an initial custom compression to compress the document's structure as much as you can, and then a second GZIP compression to compress the data.

    And don't forget that you'll have to bundle the client application with the corresponding uncompressor.

    Anyway, it's not an easy task, I admit it.

    1. Deliver the response to the client application in streaming. If the protocol is HTTP, you could use Chunked Transfer Coding.

    2. If everything else fails, you'll have to page the results and serve them to the client by pages on demand: The client makes a query, the server executes it and delivers just the first page of results. Then, the client may chose to read the next page.