We want to compress a big string in our DynamoDB table, which is a JSON object.
I want to simply replace it with a compressed string. I looked into DynamoDB documentation, which uses ByteBuffer to be stored directly, as mentioned here.
But since I don't want to save ByteArray, and instead store a compressed string version of the original string, I have modified it.
Here is what I've done:
public class GZIPStringCompression {
public static String compress(String data) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(data.length());
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
gzipOutputStream.write(data.getBytes());
gzipOutputStream.close();
return byteArrayOutputStream.toString();
}
public static String decompress(String compressed) throws IOException {
ByteArrayInputStream bis = new ByteArrayInputStream(compressed.getBytes());
GZIPInputStream gis = new GZIPInputStream(bis);
BufferedReader br = new BufferedReader(new InputStreamReader(gis, StandardCharsets.UTF_8));
StringBuilder sb = new StringBuilder();
String line;
while((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
gis.close();
bis.close();
return sb.toString();
}
}
This gives out the exception:
Exception in thread "main" java.util.zip.ZipException: Not in GZIP format
at java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
at GZIPStringCompression.decompress(MyClass.java:41)
at MyClass.main(MyClass.java:16)
I am not sure what I want to is even possible, that's why, want to confirm that here.
Changed this to:
class GZIPStringCompression {
public static String compress(String data) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(data.length());
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
gzipOutputStream.write(data.getBytes());
gzipOutputStream.close();
return Base64.getEncoder().encodeToString(byteArrayOutputStream.toByteArray());
}
public static String decompress(String compressed) throws IOException {
ByteArrayInputStream bis = new ByteArrayInputStream(Base64.getDecoder().decode(compressed));
GZIPInputStream gis = new GZIPInputStream(bis);
BufferedReader br = new BufferedReader(new InputStreamReader(gis, StandardCharsets.UTF_8));
StringBuilder sb = new StringBuilder();
String line;
while((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
gis.close();
bis.close();
return sb.toString();
}
}
This somehow worked. Would this be a dependable solution?
Your first solution didn't work because you wanted to take a byte array (array of 8-bit bytes) and assign it to a String attribute (basically an array of unicode characters). That doesn't make sense can can result in all sorts of unwanted manipulation for your bytes that makes them unusable when you read them back.
Your approach of converting the byte array into base-64 encoding - basically a subset of ASCII - works, because ASCII characters can indeed be represented as in the String without any manipulation and can be read back just like they were written.
Since you mentioned this is for DynamoDB, I should add the DynamoDB does have the "binary" type in addition to the "string" type, and you could just use that. In Java, you can assign the byte array directly to an attribute with of this type - and don't need to try to "convert" it into a String.