I have a zarr store of weather data with 1 hr time interval for the year 2022. So 8760 chunks. But there are data only for random days. How do i check which are the hours in 0 to 8760, the data is available? Also the store is defined with "fill_value": "NaN",
I am iterating over each hour and checking for all nan as below (using xarray
) to identify if there is data or not. But its a very time consuming process.
hours = 8760
for hour in range(hours):
if not np.isnan(np.array(xarrds['temperature'][hour])).all():
print(f"data available in hour: {i}")
is there a better way to check the data availablity?
Don't use an outer loop, and execute the command in parallel using dask:
# assuming your data is already chunked along time, i.e. .chunk({'time': 1})
da = xarrds['temperature']
# get the names of non-time dims to reduce over
non_time_dims = [d for d in da.dims if d != 'time']
# create boolean DataArray indexed by time giving where array is all NaN
all_null_by_hour = da.isnull().all(dim=non_time_dims)
# compute the array
all_null_by_hour = all_null_by_hour.compute()