I've got a bit of code I've been using for a while to fetch data from a web server and a few months ago, I added compression support which seems to be working well for "regular" HTTP responses where the whole document is contained in the response. It does not seem to be working when I use a Range
header, though.
Here is the code doing the real work:
InputStream in = null;
int bufferSize = 4096;
int responseCode = conn.getResponseCode();
boolean error = 5 == responseCode / 100
|| 4 == responseCode / 100;
int bytesRead = 0;
try
{
if(error)
in = conn.getErrorStream();
else
in = conn.getInputStream();
// Buffer the input
in = new BufferedInputStream(in);
// Handle compressed responses
if("gzip".equalsIgnoreCase(conn.getHeaderField("Content-Encoding")))
in = new GZIPInputStream(in);
else if("deflate".equalsIgnoreCase(conn.getHeaderField("Content-Encoding")))
in = new InflaterInputStream(in, new Inflater(true));
int n;
byte[] buffer = new byte[bufferSize];
// Now, just write out all the bytes
while(-1 != (n = in.read(buffer)))
{
bytesRead += n;
out.write(buffer, 0, n);
}
}
catch (IOException ioe)
{
System.err.println("Got IOException after reading " + bytesRead + " bytes");
throw ioe;
}
finally
{
if(null != in) try { in.close(); }
catch (IOException ioe)
{
System.err.println("Could not close InputStream");
ioe.printStackTrace();
}
}
Hitting a URL with the header Accept-Encoding: gzip,deflate,identity
works just great: I can see that the data is returned by the server in compressed format, and the above code decompressed it nicely.
If I then add a Range: bytes=0-50
header, I get the following exception:
Got IOException after reading 0 bytes
Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at [my code]([my code]:511)
Line 511
in my code is the line containing the in.read()
call. The response includes the following headers:
Content-Type: text/html
Content-Encoding: gzip
Content-Range: bytes 0-50/751
Content-Length: 51
I have verified that, if I don't attempt to decompress the response, I actually get 51 bytes in the response... it's not a server failure (at least that I can tell). My server (Apache httpd) does not support "deflate", so I can't test another compression scheme (at least not right now).
I've also tried to request much more data (like 700 bytes of the total 751 bytes in the target resource) and I get the same kind of error.
Is there something I'm missing?
Update Sorry, I forgot to include that I'm hitting Apache/2.2.22 on Linux. There aren't any server bugs I'm aware of. I'll have a bit of trouble verifying the compressed bytes that I retrieve from the server, as the "gzip" Content-Encoding is quite bare... e.g. I believe I can't just use "gunzip" on the command-line to decompress those bytes. I'll give it a try, though.
Sigh switching to another server (happens to be running Apache/2.2.25) shows that my code does in fact work. The original target server appears to be affected by AWS's current outage in the US-EAST availability zone. I'm going to chalk this up to network errors and close this question. Thanks to those who offered suggestions.