pythoncompressionzarr

Transform zarr directory storage to zip storage


codes:

store = zarr.ZipStore("/mnt/test.zip", "r")

Problem description: Hi, sry for bothering, I found this statement inside Zarr official documentation about ZipStorage: Alternatively, use a DirectoryStore when writing the data, then manually Zip the directory and use the Zip file for subsequent reads.

I am trying to transform a DirectoryStorage format Zarr dataset to a ZipStorage. I use zip operation provided in Linux. zip -r test.zip test.zarr here test.zarr is a directory storage dataset including three groups. However, when I try to use the codes above to open it, get the error as below:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/eddie/miniconda3/envs/train/lib/python3.8/site-packages/zarr/storage.py", line 1445, in __init__
    self.zf = zipfile.ZipFile(path, mode=mode, compression=compression,
  File "/home/eddie/miniconda3/envs/train/lib/python3.8/zipfile.py", line 1190, in __init__
    _check_compression(compression)
  File "/home/eddie/miniconda3/envs/train/lib/python3.8/zipfile.py", line 686, in _check_compression
    raise NotImplementedError("That compression method is not supported")
NotImplementedError: That compression method is not supported

I wonder if my compression method is wrong, and if there some workarounds to transform directory storage to zip storage or some other DB format, cause when the groups rise, the previous storage has so many nodes and not so convenient to transport. Thanks in advance.

Version and installation information
Value of zarr.__version__: 2.8.1
Value of numcodecs.__version__: 0.7.3
Version of Python interpreter: 3.8.0
Operating system (Linux/Windows/Mac): linux ubuntu 18.04
How Zarr was installed: pip

Solution

  • because zarr already uses compression, there is no need to use compression when creating the zip archive. I.e., you can use zip -r -0 to store files in the zip archive only, without compression.

    Also, you might need to be careful about the paths that get stored within the zip archive. E.g., if I have a zarr hierarchy in some directory "/path/to/foo" and I want to store this into a zip file at "/path/to/bar.zip" I would do:

    cd /path/to/foo
    zip -r0 /path/to/bar.zip
    

    This ensures that the paths that get stored within the zip archive are relative to the original root directory.