daskpython-xarrayopendap

xarray: rolling mean of dask array conflicting sizes for data and coordinate in rolling operation


I am trying to do a rolling mean to a dask array within xarray. My issue may lay in the rechunking before the rolling mean. I am getting a ValueError of conflicting sizes between data and coordinates. However, this arises within the rolling operation as I don't think there are conflicts in the data and coords of the array before going into the rolling operation.

Apologies for not creating data to test but my project data is quick to play with:

import xarray as xr

remote_data = xr.open_dataarray('http://iridl.ldeo.columbia.edu/SOURCES/.Models'\
                                '/.SubX/.RSMAS/.CCSM4/.hindcast/.zg/dods',
                                chunks={'L': 1, 'S': 1})
da = remote_data.isel(P=0,L=0,M=0,X=0,Y=0)
da_day_clim = da.groupby('S.dayofyear').mean('S')
print(da_day_clim)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(1,)>
#Coordinates:
#    L          timedelta64[ns] 12:00:00
#    Y          float32 -90.0
#    M          float32 1.0
#    X          float32 0.0
#    P          int32 500
#  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...

# Do a 31-day rolling mean
# da_day_clim.rolling(dayofyear=31, center=True).mean()
# This brings up:
#ValueError: The overlapping depth 30 is larger than your
#smallest chunk size 1. Rechunk your array
#with a larger chunk size or a chunk size that
#more evenly divides the shape of your array.

# Read http://xarray.pydata.org/en/stable/dask.html
# and found http://xarray.pydata.org/en/stable/generated/xarray.Dataset.chunk.html#xarray.Dataset.chunk
# I could make a little PR to add the .chunk() into the ValeError message. Thoughts?

# Rechunk. Played around with a few values but decided on 
# the len of dayofyear
da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
print(da_day_clim2)
#<xarray.DataArray 'zg' (dayofyear: 366)>
#dask.array<shape=(366,), dtype=float32, chunksize=(366,)>
#Coordinates:
#    L          timedelta64[ns] 12:00:00
#    Y          float32 -90.0
#    M          float32 1.0
#    X          float32 0.0
#    P          int32 500
#  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...

# Rolling mean on this
da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-57-6acf382cdd3d> in <module>()
      4 da_day_clim = da.groupby('S.dayofyear').mean('S')
      5 da_day_clim2 = da_day_clim.chunk({'dayofyear': 366})
----> 6 da_day_clim_smooth = da_day_clim2.rolling(dayofyear=31, center=True).mean()

~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/rolling.py in wrapped_func(self, **kwargs)
    307             if self.center:
    308                 values = values[valid]
--> 309             result = DataArray(values, self.obj.coords)
    310 
    311             return result

~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, encoding, fastpath)
    224 
    225             data = as_compatible_data(data)
--> 226             coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    227             variable = Variable(dims, data, attrs, encoding, fastpath=True)
    228 

~/anaconda/envs/SubXNAO/lib/python3.6/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
     79                 raise ValueError('conflicting sizes for dimension %r: '
     80                                  'length %s on the data but length %s on '
---> 81                                  'coordinate %r' % (d, sizes[d], s, k))
     82 
     83         if k in sizes and v.shape != (sizes[k],):

ValueError: conflicting sizes for dimension 'dayofyear': length 351 on the data but length 366 on coordinate 'dayofyear'

The length 351 is related to 366-351=15 (half the window).


Solution

  • This turned out to be a bug in Xarray and was fixed in https://github.com/pydata/xarray/pull/2122

    The fix will be in Xarray 0.10.4 which is slated for imminent release.