GNU gzip and zlib are written by the same authors. However, I have noticed that the decompression speed between the two differs significantly.
For example, decompressing linux.tar.gz
with gnu gzip took 4.5s, while zlib did it in just 2.6s on my computer. When I profiled the two, I noticed GNU gzip spends significantly more time on crc32 computation compared to zlib.
using zlib under the hood
Overhead Command Shared Object Symbol
58.41% gunzip gunzip [.] inflate_fast
12.60% gunzip gunzip [.] crc32_z
2.57% gunzip [unknown] [k] 0xffffffff94c247e7
1.82% gunzip gunzip [.] inflate
1.42% gunzip libc.so.6 [.] __memmove_avx_unaligned_erms
1.19% gunzip [unknown] [k] 0xffffffff94c250ba
0.81% gunzip [unknown] [k] 0xffffffff94c28ec9
gnu gzip
Overhead Command Shared Object Symbol
62.78% gzip gzip [.] flush_window
19.47% gzip gzip [.] inflate_codes
4.32% gzip libc.so.6 [.] __memmove_avx_unaligned_erms
1.40% gzip [unknown] [k] 0xffffffff94c247e7
0.60% gzip [unknown] [k] 0xffffffff94c250ba
This seems to suggest that crc32 computation on gnu gzip is not as efficient as zlib. Looking at the code, gnu gzip indeed relies on a simple table-lookup based crc32 vs zlib seems to use much more optimized/complicated implementation.
If this is indeed the case, I am wondering why they can't share the same crc32 computation and therefore speed up gnu gzip as well. Given that virtually all linux distros are shipped with gnu gzip, it will be hugely beneficial to improve its performance to be on par with zlib.
If not, I am curious why the two have such large discrepancy in performance.
-- edit --
I failed to mention that flush_window()
spends most time in updcrc()
Percent│ test %ebp,%ebp
│ je 78
│ updcrc():
│ c = crc;
│ mov %ebp,%r8d
│ lea window,%rdx
│ mov crc,%rax
│ if (n) do {
0.01 │ lea crc_32_tab,%rsi
│ lea (%rdx,%r8,1),%rdi
│ xchg %ax,%ax
│ c = crc_32_tab[((int)c ^ (*s++)) & 0xff] ^ (c >> 8);
10.37 │30: movzbl (%rdx),%ecx
10.55 │ add $0x1,%rdx
17.32 │ xor %eax,%ecx
18.65 │ shr $0x8,%rax
12.89 │ movzbl %cl,%ecx
19.73 │ xor (%rsi,%rcx,8),%rax
│ } while (--n);
10.45 │ cmp %rdi,%rdx
│ jne 30
│ crc = c;
0.02 │ mov %rax,crc
Below is flamegraph from GNU gzip
Below is flamegraph from a thin wrapper around zlib
Tested on a system with AMD 7735HS running Ubuntu 22.04, compiled with -O3.
This is indeed due to crc32 computation.
Reference: https://lists.gnu.org/r/bug-gzip/2023-11/msg00000.html