gzipzlibcrc32

gnu gzip decompression significantly slower than zlib


GNU gzip and zlib are written by the same authors. However, I have noticed that the decompression speed between the two differs significantly.

For example, decompressing linux.tar.gz with gnu gzip took 4.5s, while zlib did it in just 2.6s on my computer. When I profiled the two, I noticed GNU gzip spends significantly more time on crc32 computation compared to zlib.

using zlib under the hood
Overhead  Command  Shared Object     Symbol
  58.41%  gunzip   gunzip            [.] inflate_fast
  12.60%  gunzip   gunzip            [.] crc32_z
   2.57%  gunzip   [unknown]         [k] 0xffffffff94c247e7
   1.82%  gunzip   gunzip            [.] inflate
   1.42%  gunzip   libc.so.6         [.] __memmove_avx_unaligned_erms
   1.19%  gunzip   [unknown]         [k] 0xffffffff94c250ba
   0.81%  gunzip   [unknown]         [k] 0xffffffff94c28ec9
gnu gzip
Overhead  Command  Shared Object     Symbol
  62.78%  gzip     gzip              [.] flush_window                                                                                                                
  19.47%  gzip     gzip              [.] inflate_codes                                                                                                               
   4.32%  gzip     libc.so.6         [.] __memmove_avx_unaligned_erms                                                                                                
   1.40%  gzip     [unknown]         [k] 0xffffffff94c247e7                                                                                                          
   0.60%  gzip     [unknown]         [k] 0xffffffff94c250ba         

This seems to suggest that crc32 computation on gnu gzip is not as efficient as zlib. Looking at the code, gnu gzip indeed relies on a simple table-lookup based crc32 vs zlib seems to use much more optimized/complicated implementation.

If this is indeed the case, I am wondering why they can't share the same crc32 computation and therefore speed up gnu gzip as well. Given that virtually all linux distros are shipped with gnu gzip, it will be hugely beneficial to improve its performance to be on par with zlib.

If not, I am curious why the two have such large discrepancy in performance.

-- edit --

I failed to mention that flush_window() spends most time in updcrc()

Percent│      test    %ebp,%ebp                                                                                                                                      
       │      je      78                                                                                                                                             
       │    updcrc():                                                                                                                                                
       │    c = crc;                                                                                                                                                 
       │      mov     %ebp,%r8d                                                                                                                                      
       │      lea     window,%rdx                                                                                                                                    
       │      mov     crc,%rax                                                                                                                                       
       │    if (n) do {                                                                                                                                              
  0.01 │      lea     crc_32_tab,%rsi                                                                                                                                
       │      lea     (%rdx,%r8,1),%rdi                                                                                                                              
       │      xchg    %ax,%ax                                                                                                                                        
       │    c = crc_32_tab[((int)c ^ (*s++)) & 0xff] ^ (c >> 8);                                                                                                     
 10.37 │30:   movzbl  (%rdx),%ecx                                                                                                                                    
 10.55 │      add     $0x1,%rdx                                                                                                                                      
 17.32 │      xor     %eax,%ecx                                                                                                                                      
 18.65 │      shr     $0x8,%rax                                                                                                                                      
 12.89 │      movzbl  %cl,%ecx                                                                                                                                       
 19.73 │      xor     (%rsi,%rcx,8),%rax                                                                                                                             
       │    } while (--n);                                                                                                                                           
 10.45 │      cmp     %rdi,%rdx                                                                                                                                      
       │      jne     30                                                                                                                                             
       │    crc = c;                                                                                                                                                 
  0.02 │      mov     %rax,crc                          

Below is flamegraph from GNU gzip enter image description here

Below is flamegraph from a thin wrapper around zlib enter image description here

Tested on a system with AMD 7735HS running Ubuntu 22.04, compiled with -O3.


Solution

  • This is indeed due to crc32 computation.

    Reference: https://lists.gnu.org/r/bug-gzip/2023-11/msg00000.html