pythonpython-3.xcompressionlzmazstandard

How to decompress lzma2 (.xz) and zstd (.zst) files into a folder using Python 3?


I have been working for a long time with .bz2 files. To unpack/decompress .bz2 files into a specific folder I have been using the following function:

destination_folder = 'unpacked/'
def decompress_bz2_to_folder(input_file):
    unpackedfile = bz2.BZ2File(input_file)
    data = unpackedfile.read()
    open(destination_folder, 'wb').write(data)

Recently I obtained a list of files with the .xz (not .tar.xz) and .zst extensions. My poor research skills told me that the former is lzma2 compression and the latter is Zstandard.

However, I couldn't find of an easy way to unpack the contents of these archives into a folder (like I do with the .bz2 files).

How can I:

  1. Unpack the contents of an .xz (lzma2) file into a folder using Python 3?
  2. Unpack the contents of a .zst (Zstandard) file into a folder using Python 3?

Important Note: I'm unpacking very large files, so it would be great if the solution takes into consideration any potential Memory Errors.


Solution

  • The LZMA data can be decompressed using the lzma module, simply open the file with that module, then use shutil.copyfileobj() to efficiently copy the decompressed data to an output file without running into memory issues:

    import lzma
    import pathlib
    import shutil
    
    def decompress_lzma_to_folder(input_file):
        input_file = pathlib.Path(input_file)
        with lzma.open(input_file) as compressed:
            output_path = pathlib.Path(destination_dir) / input_file.stem
            with open(output_path, 'wb') as destination:
                shutil.copyfileobj(compressed, destination)
            
    

    The Python standard library doesn't have any support for Zstandard compression yet, you can use either the zstandard (by IndyGreg from Mozilla and the Mercurial project) or zstd; the latter is perhaps too basic for your needs, while zstandard offers a streaming API specifically suited for reading files.

    I'm using the zstandard library here to benefit from the copying API it implements, which lets you decompress and copy at the same time, similar to how shutil.copyfileobj() works:

    import zstandard
    import pathlib
    
    def decompress_zstandard_to_folder(input_file):
        input_file = pathlib.Path(input_file)
        with open(input_file, 'rb') as compressed:
            decomp = zstandard.ZstdDecompressor()
            output_path = pathlib.Path(destination_dir) / input_file.stem
            with open(output_path, 'wb') as destination:
                decomp.copy_stream(compressed, destination)