rlarge-datacdo-climater-stars

R stars error: dims do not match the length of object when evaluating proxy


I am working on a large dataset from a global climate model that can't fit into RStudio's memory, hence I've chosen to analyze it by loading it as a stars proxy object and evaluating it chunk by chunk.

Taking spatial chunks over the full time period works fine, e.g.:

full_dataset <- read_stars(data_path, proxy = T)
data_chunk <- full_dataset[,1,1,] %>%
   st_as_stars()

However, taking a temporal chunk (regardless of whether it's on the full area or not) returns an error at the evaluation step. For example:

data_chunk <- full_dataset[,,,1] %>%
   st_as_stars()

returns the error:

Error in dim(data) <- c(dim(data)[1:2], newdims) : 
dims do not match the length of object

Before the pipe, the subsetted proxy object looks like that:

> full_dataset[,,,1]
 stars_proxy object with 1 attribute in 1 file(s):
 $SB_HW_01deg_crop.nc
 [1] "[...]/SB_HW_01deg_crop.nc"

 dimension(s):
              from  to offset delta  refsys point                          values x/y
 x               1 328 -19.95   0.1      NA    NA                            NULL [x]
 y               1 249   64.9  -0.1      NA    NA                            NULL [y]
 time_counter    1   1     NA    NA udunits    NA [687268800,687355200) [(seconds since 1950-01-01 00:00:00)]  

I get the same error with plot() instead of st_as_stars() and also when subsetting with rasterIO instead of using [,,,1]. This dataset comes from a NetCDF file originally defined in curvilinear coordinates, which I have first interpolated onto a regular grid using CDO's remapbil operator outside of RStudio. (Because I found out that stars did not support proxies for curvilinear grids.) The full dataset has over 45000 time steps. When using ncview or ncdump directly in the terminal, the file seems fine.

I imagine I'm using stars wrong here somehow but I just can't figure out what the problem is. I don't really understand the underlying process and what the error message is referring to. The dataset is very big but I can provide it if needed.

EDIT: I managed to increase my memory allowance and load the full dataset into memory without going through the proxy phase. Everything works fine that way. It seems the problem comes from the proxy evaluation. Still no clue what it is, though, and would love to be able to work with a proxy instead.


Solution

  • https://github.com/r-spatial/stars/issues/561 the package maintainer has now solved this issue.