I'm writing a simple Java http server that responds with JSON data. I'm trying to GZip the data before sending it, but it usually sends back gzipped data that produces an error in the browser. For example, in Firefox it says:
Content Encoding Error The page you are trying to view cannot be shown because it uses an invalid or unsupported form of compression.
Sometimes it works if the string I'm compressing is small without certain characters, but it seems to mess up when there are brackets, etc. In particular, the example text I have below fails.
Is this some kind of character encoding issue? I've tried all sorts of things, but it just doesn't want to work easily.
String text;
private Socket server;
DataInputStream in = new DataInputStream(server.getInputStream());
PrintStream out = new PrintStream(server.getOutputStream());
while ((text = in.readLine()) != null) {
// ... process header info
if (text.length() == 0) break;
}
out.println("HTTP/1.1 200 OK");
out.println("Content-Encoding: gzip");
out.println("Content-Type: text/html");
out.println("Connection: close");
// x is the text to compress
String x = "jsonp1330xxxxx462022184([[";
ByteArrayOutputStream outZip = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(outZip);
byte[] b = x.getBytes(); // Changing character encodings here makes no difference
gzip.write(b);
gzip.finish();
gzip.close();
outZip.close();
out.println();
out.print(outZip);
server.close();
The accepted answer is incorrect.
GZIPOutputStream
can indeed be used to implement gzip
content encoding in HTTP. In fact, that's exactly how I implemented it in the JLHTTP lightweight HTTP server. Support for deflate
content encoding is identical, just with DeflaterOutputStream
used instead. The problem with the above code is simply that it's buggy :-)
All println
statements (including the one at the bottom) should be replaced with print
and an explicit \r\n
at the end of the string. This is because the newline characters printed by println
are platform-dependent, so e.g. on Linux it will only print a \n
, whereas HTTP requires a full CRLF (\r\n
).
out.print(outZip)
basically calls outZip.toString()
and prints that out to the stream. However, outZip
contains compressed binary data, so converting it to a string (using the arbitrary platform default encoding, no less), is very likely to corrupt the data.
The code takes the string, converts it to bytes, compresses them, converts them back to a string, converts them back to bytes and writes them out. Instead, it need only convert the string to bytes, compress them and write them out. You don't need the ByteArrayOutputStream
for that either, the GZIPOutputStream
can wrap the underlying output stream directly. Just don't forget to flush the print stream after the headers (and trailing CRLF), and only then start with the compressed stream for the body.
Closing resources should be done in finally or try-with-resources blocks, and with the correct order and timing.
In this sample, the connection is closed at the end of the stream, which is fine. But in general, if you want to keep the connection alive and stream potentially large data with unknown length (you don't know the compressed size in advance), you need to implement the chunked
transfer encoding as well (it's pretty simple).
With the code fixed, GZIPOutputStream
works like a charm.
However, while great for educational purposes, please note that this is not an HTTP server, even if fixed. You could further read RFC 2616 or 7230 to learn what else HTTP is required to do... but why reinvent the weel? There are a bunch of lightweight embeddable HTTP servers out there that you can use to get the job done properly with little effort, JLHTTP among them.