I have a bunch of long strings (16200 characters) that I want to compress. The entire string only uses 12 different characters (currently _oOwWgGmdDsS and, but those can change if needed).
I'm looking to compress this string. I currently made a compression scheme myself, where each time I first put the character, and then how many times it appears before another one is in the string. So if the uncompressed text looks like this:
ooooooWW_
Then the compressed becomes
o6W2_1
For the strings I currently have this reduced the size from about 128MB to 4MB. However, as you can see, for the W's there is no saving, and for the _ there's even a loss.
So I was wondering, are there more sophisticated compression schemes I can use? The end result has to be plain text however, not binary data.
Note: It would also be awesome if there exists a library for both Python and Lua for them.
Use zlib to compress to binary, and then base64 to expand the binary to plain text. Python has both built in. A little googling will turn up Lua bindings for zlib and base64 code.
Example:
import zlib
import base64
text = input('Text to compress > ')
compressed = base64.b64encode(zlib.compress(text.encode())).decode()
print('Compressed Text:', compressed)
text = input('Text do decompress > ')
decompressed = zlib.decompress(base64.b64decode(text.encode())).decode()
print('Decompressed Text:', decompressed)