How do I add a new DataArray
to an existing Dataset
without overwriting the whole thing? The new DataArray
shares some coordinates with the existing one, but also has new ones. In my current implementation, the Dataset
gets completely overwritten, instead of just adding the new stuff.
The existing DataArray
is a chunked zarr-backed DirectoryStore
(though I have the same problem for an S3 store).
import numpy as np
import xarray as xr
import zarr
arr1 = xr.DataArray(np.random.randn(2, 3),
[('x', ['a', 'b']), ('y', [10, 20, 30])],
name='arr1')
ds = arr1.chunk({'x': 1, 'y': 3}).to_dataset()
ds
looks like this:
<xarray.Dataset>
Dimensions: (x: 2, y: 3)
Coordinates:
* x (x) <U1 'a' 'b'
* y (y) int64 10 20 30
Data variables:
arr1 (x, y) float64 dask.array<shape=(2, 3), chunksize=(1, 3)>
I write it to a directory store:
store = zarr.DirectoryStore('test.zarr')
z = ds.to_zarr(store, group='arr', mode='w')
It looks good:
$ ls -l test.zarr/arr
total 0
drwxr-xr-x 6 myuser mygroup 204 Sep 21 11:03 arr1
drwxr-xr-x 5 myuser mygroup 170 Sep 21 11:03 x
drwxr-xr-x 5 myuser mygroup 170 Sep 21 11:03 y
I create a new DataArray
that shares some coordinates with the existing one, and add it to the existing Dataset
. I'll read the existing Dataset
first, since that's what I'm doing in practice.
ds2 = xr.open_zarr(store, group='arr')
arr2 = xr.DataArray(np.random.randn(2, 3),
[('x', arr1.x), ('z', [1, 2, 3])],
name='arr2')
ds2['arr2'] = arr2
The updated Dataset
looks fine:
<xarray.Dataset>
Dimensions: (x: 2, y: 3, z: 3)
Coordinates:
* x (x) <U1 'a' 'b'
* y (y) int64 10 20 30
* z (z) int64 1 2 3
Data variables:
arr1 (x, y) float64 dask.array<shape=(2, 3), chunksize=(1, 3)>
arr2 (x, z) float64 0.4728 1.118 0.7275 0.4971 -0.3398 -0.3846
...but I can't write to it without a complete overwrite.
# I think I'm "appending" to the group `arr`
z2 = ds2.to_zarr(store, group='arr', mode='a')
This gives me a ValueError: The only supported options for mode are 'w' and 'w-'.
# I think I'm "creating" the new arr2 array in the arr group
z2 = ds2.to_zarr(store, group='arr', mode='w-')
This gives me ValueError: path 'arr' contains a group
.
The only thing that worked is z2 = ds2.to_zarr(store, group='arr', mode='w')
, but this completely overwrites the group.
The original DataArray
is actually quite large in my problem, so I really don't want to re-write it. Is there a way to only write the new DataArray
?
Thank you!
The existing answers are out of date: mode="a"
is now supported in xarray. See the documentation:
Xarray supports several ways of incrementally writing variables to a Zarr store. These options are useful for scenarios when it is infeasible or undesirable to write your entire dataset at once.
- Use
mode='a'
to add or overwrite entire variables,- Use
append_dim
to resize and append to existing variables, and- Use
region
to write to limited regions of existing arrays.