Helo everyone
I am trying to understand Deflate
compression but from what i have seen i think i have misunderstood or done something wrong.
So i grabbed a source code for 7zip
so i might understand what is happening but the source code is hard to read and i couldn't find the function responsible for Deflate
compression
but here is what's troubling me.
i have 2 text files.
Test1.txt has:
"Helo everyone I am trying to understand Deflate compression but from what i have seen i think i have misunderstood or done something wrong. So i grabbed a source code for 7zip so i might understand what is happening but the source code is hard to read and i couldn't find the function responsible for Deflate compression but here is what's troubling me."
Test2.txt has:
"Helo everyone I am trying to understand Deflate compression but from what i have seen i think i have misunderstood or done something wrong. So i grabbed a source code for 7zip"
if i use zlib.compress()
import zlib
def deflate_file(input_filename, output_filename):
with open(input_filename, 'rb') as input_file:
data = input_file.read()
compressed_data = zlib.compress(data, zlib.Z_BEST_COMPRESSION)
with open(output_filename, 'wb') as output_file:
output_file.write(compressed_data)
input_file = 'test1.txt'
output_file = 'compressed.deflate'
deflate_file(input_file, output_file)
and i get this
78 DA 6D 50 BD 6E 03 21 0C DE 4F BA 77 F0 96 2D 6B 5F A0 43 3B F7 09 E0 F0 81 D5 C3 46 06 12 25 4F 5F 3B 69 A5 56 EA 06 F6 F7 EB 37 3C 04 F0 82 7A 13 C6 75 59 97 77 08 15 86 DE 88 33 0C 81 C9 09 B5 8F C0 09 5E 71 3F C2 40 D8 A4 36 C5 DE 49 18 E2 1C B0 AB 54 B8 96 30 80 A0 84 0B 42 47 64 7B 8F 42 FC F9 33 AB D4 BF A5 44 12 88 42 32 3F E8 52 D1 61 19 AE 2A 9C CF EB F2 21 C6 C8 1A 62 C4 04 C1 00 53 37 B7 4C 08 BB B1 5E EE D4 6C 68 98 4A B9 8C DF F1 9E 09 BA D9 B5 86 EC 9A 1E 6E 14 FC 23 F2 00 68 F2 6A 8A C1 2C 8C 49 B6 9A 47 E2 93 75 21 FB 3B 67 9F BC 0D 6F 68 4D 9B 70 A7 78 3C 13 FC 73 85 75 71 A7 82 FA 90 F7 1C A7 6E 27 94 19 0F 8F 51 F1 FC 05 8F 38 80 56
but compressing the file with 7zip i get
50 4B 03 04 14 00 00 00 08 00 72 8C 4C 58 BF BE E6 4C D7 00 00 00 66 01 00 00 09 00 00 00 74 65 73 74 31 2E 74 78 74 6D 50 3B 52 03 31 0C ED 77 66 EF A0 2E 5D 5A 2E 40 01 35 27 F0 AE B5 B6 07 5B F2 C8 72 32 E1 F4 48 09 0C 14 E9 6C E9 7D F5 86 95 01 2F 28 37 26 5C 97 75 79 87 D0 40 E5 56 28 81 32 4C 8A 28 43 03 45 78 C5 A3 06 45 D8 B9 75 C1 31 0A 13 6C 53 E1 10 6E 70 CD 41 A1 40 0E 17 84 81 48 F6 D6 5C E8 F3 77 D6 CA F8 91 62 8E C0 02 D1 FC 60 70 43 87 25 B8 0A 53 3A AF CB 07 1B 23 49 D8 36 8C 10 0C 30 65 77 CB 88 70 18 EB E5 AB 74 1B 1A A6 95 94 F5 7F BC 47 82 61 76 BD 23 B9 A6 87 D3 EC 2E 7F 22 77 80 44 AF 26 18 CC C2 98 C5 56 B3 46 3A 59 97 62 7F E7 1C 93 76 F5 86 D6 B4 33 8D B2 D5 47 82 27 57 58 17 77 CA 28 77 79 CF 71 1A 76 42 9E 5B F5 18 0D CF DF 50 4B 01 02 3F 00 14 00 00 00 08 00 72 8C 4C 58 BF BE E6 4C D7 00 00 00 66 01 00 00 09 00 24 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 74 65 73 74 31 2E 74 78 74 0A 00 20 00 00 00 00 00 01 00 18 00 2F 80 43 1F C9 5D DA 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50 4B 05 06 00 00 00 00 01 00 01 00 5B 00 00 00 FE 00 00 00 00 00
Questions.
What is the difference between zlib.compress()
and 7zip's deflate?
How can i get same output as 7zip from zlib?
Why are the outputs of 7zip's Deflate on test1.txt and test2.txt have different most significant bits as the input have?
If its possible i want to add a feature in my 7zip source code to save the deflated data for analysis.
You are compressing to two different formats. The 78 da
is the start of a zlib stream. The 50 4B
is the start of a zip file. See https://stackoverflow.com/a/20765054/1180620
I do not believe that 7zip has an option to produce a zlib stream.