pythonencodingzlibbytea

decompressing string from database in python


I know this question will looks a bit unclear but I reached a level of frustration that drives me to ask this here..

I'm working with data from a POSTGRESQL database, and I get something like this:

2022-06-01 02:21:52.770293  2022-06-01 02:21:52.78704   \\x0a78daa5534d6fe32014fc2fdca90063b0c9a91f52d...
2022-06-01 02:21:55.991809  2022-06-01 02:21:56.04597   \\x0a78dac5534d6be33010fd2fbe2b58b264c9caa9ed4...

I know that the counter column in a compressed string that contains JSON-like data. I know (because it was already decompressed in the past) that the usage of zlib package can decompress this string (by something like zlib.decompress(mycompressedstring)

But there is a missing step here because this string \\x0a78... is not decompressable. I suspect there is an encoding-decoding work to do before calling zlib but I struggle to find what to do..

I tried:

test = bytes(sample.iloc[1]['counter'], 'UTF16')

This leads to something like: enter image description here

I was thinking it is better but zlib cannot decompress this

testunc = zlib.decompress(test)
error: Error -3 while decompressing data: incorrect header check

Please, can someone help me there? Bu giving me a track to follow to fing what is wrong with this...


Solution

  • The hexadecimal representations starting with 78da... are the starts of valid zlib streams. You need to discard the \\x0a and convert the remainder from hexadecimal to binary. The result of that would be given to zlib.decompress(). Look at a2b_hex in binascii.