xz

Multiprocessor support for `xz`?


Is there a way to spread xz compression efforts across multiple CPU's? I realize that this doesn't appear possible with xz itself, but are there other utilities that implement the same compression algorithm that would allow more efficient processor utilization? I will be running this in scripts and utility apps on systems with 16+ processors and it would be useful to at least use 4-8 processors to potentially speed up compression rates.


Solution

  • Enabling multi-threading

    To enable the functionality, add the -T option, along with either the number of worker threads to spawn, or -T0 to spawn as many CPU's as the OS reports:

    xz -T0 big.tar
    xz -T4 bigish.tar
    

    The default single threaded operation is equivalent to -T1.

    I have found that running it with a couple of hyper-threads less than the total number of hyperthreads on my CPU provides a good balance of responsiveness and compression speed.

    † So -T10 on my 6 core, 12 thread workstation.

    As scai and Dzenly said in comments

    If you want to use this in combination with tar just call export XZ_DEFAULTS="-T 0" before.

    or use smth like: XZ_OPT="-2 -T0"

    Compression

    Multiprocessor (multithreading) compression support was added to xz in version 5.2, in December 2014.

    Decompression

    Multiprocessor (multithreading) decompression support was added to xz in version 5.4.0, in December 2022

    It can only use multiple threads with .xz files that have multiple Blocks with size information in Block Headers. Files created by the single threaded xz compressor don't have these block, so if the file was compressed with a single thread it can only be decompressed with a single thread.

    Files created by the multi-threaded xz compressor always have these blocks, so if the file was compressed with multiple threads it can be decompressed with multiple threads too.