I would like to produce a zarr array pointing to part of a zarr array on disk, similar to how sliced = np_arr[5]
gives me a view into np_arr
, such that modifying the data in sliced
modifies the data in np_arr
. Example code:
import matplotlib.pyplot as plt
import numpy as np
import zarr
arr = zarr.open(
'temp.zarr',
mode='a',
shape=(4, 32, 32),
chunks=(1, 16, 16),
dtype=np.float32,
)
arr[:] = np.random.random((4, 32, 32))
fig, ax = plt.subplots(1, 2)
arr[2, ...] = 0 # works fine, "wipes" slice 2
ax[0].imshow(arr[2]) # all 0s
arr_slice = arr[1] # returns a NumPy array — loses ties to zarr on disk
arr_slice[:] = 0
ax[1].imshow(arr[1]) # no surprises — shows original random data
plt.show()
Is there anything I can write instead of arr_slice = arr[1]
that will make arr_slice
be a (writeable) view into the arr
array on disk?
The TensorStore library is specifically designed to do this --- all indexing operations produce lazy views:
import tensorstore as ts
import numpy as np
arr = ts.open({
'driver': 'zarr',
'kvstore': {
'driver': 'file',
'path': '.',
},
'path': 'temp.zarr',
'metadata': {
'dtype': '<f4',
'shape': [4, 32, 32],
'chunks': [1, 16, 16],
'order': 'C',
'compressor': None,
'filters': None,
'fill_value': None,
},
}, create=True).result()
arr[1] = 42 # Overwrites, just like numpy/zarr library
view = arr[1] # Returns a lazy view, no I/O performed
np.array(view) # Reads from the view
# Returns JSON spec that can be passed to `ts.open` to reopen the view.
view.spec().to_json()
You can read more about the "index transform" mechanism that underlies these lazy views here: https://google.github.io/tensorstore/index_space.html#index-transform https://google.github.io/tensorstore/python/indexing.html
Disclaimer: I'm an author of TensorStore.