rgeospatialnetcdfncdf4

# How to apply a monthly mean to daily data in a 3d array, storing this to a 4d array?

I have daily geo-spatial data over a period of 2 years. The data is in two nc files, each containing 365 days' worth of data. I want to apply a calculation to the data, and then calculate the monthly mean of this for each 12 months of each year across each gridbox.

``````lons <- 584
lats <- 712
MyValues_ym <- array(0., c(lons, lats, 2, 12))
MyCalculation <- array(0., c(lons, lats, ts))  ## set up data array to fit dailycalculations
MyMonths = c('01','02','03','04','05','06','07','08','09','10','11','12')
ts <- dim(P)[[3]]  ## set MyCalculation array coordinates the same as precipitation coordinates
## Dimension of precipitation coordinates: [584L, 712L, 365L]
``````
``````## calculate MeanMonthly  #calculated monthly q0
## Loop over months
for (m in 1:12) {
MyValues_ym[, , y - 1982, m] <- apply(MyCalculation, c(2), mean)
}
``````

This is giving the error:

``````Error in `[<-`(`*tmp*`, , , y - 1982, m, value = c(NA_real_, NA_real_,  :
subscript out of bounds
``````

Any help would be greatly appreciated.

I have tried to use `rowMeans`, but this was not calculating the mean for the time dimension of my array. I am not sure how to calculate the mean for the particular dimension required.

I tried this with RowMeans:

``````    # Loop over months
for (m in 1:12) {
my_months_yearly <- format(daterange, '%m') == MyMonths[m]
MyValues_ym[,,y-1982,m]=rowMeans(Q0d[,,my_months_yearly], na.rm = TRUE, dims = 2)
``````

EDIT

I have now made a MRE:

``````lons <- 10
lats <- 10

MyYearlyMonthlyData <- array(0., c(lons,lats,1,12)) ## set up array for monthly calculation
MyCalculationYr1 <- array(runif(1:100), c(lons, lats, 365)) ## set up data array to fit dailycalculations

## set up the date range and months for the loop
start = '1985-01-01'
end = '1985-12-31'
daterange = seq(as.Date(start), as.Date(end), "days")
MyMonths = c('01','02','03','04','05','06','07','08','09','10','11','12')
y=1985

## the loop to generate monthly values of each day of each month for the year

for (m in 1:12) {
blMM <- format(daterange,'%m') == MyMonths[m]
MyYearlyMonthlyData[,,y-1984,m] = rowMeans(MyCalculationYr1[,,blMM],na.rm=TRUE,dims=2)
}

``````

Solution

• The precipitation data is from the TAMSAT archive (as per your comment under your post). I quickly downloaded a file and it conforms to the CF Metadata Conventions, as is common for meteorological data distributed through NetCDF files. You can check that easily by checking for the "Conventions" global attribute:

``````nc <- nc_open("./rfe_file.nc")
ncatt_get(nc, "")\$Conventions
``````

These files typically contain 3-d arrays of data, as is the case here. Dimensions are `lon`, `lat` and `time`. You can easily work with the `time` dimension using the CFtime package:

``````library(CFtime)

# Create a CFtime instance from the "time" dimension in your file
cf <- CFtime(nc\$dim\$time\$units, nc\$dim\$time\$calendar, nc\$dime\$time\$vals)

# Create a factor over the months in your data file
mon <- CFfactor(cf, "month")

# Extract the data from the file
data <- ncvar_get(nc, "ref_filled")

# Aggregate to monthly mean (mm/day)
pr <- apply(data, 1:2, tapply, mon, mean)

# Re-arrange the dimension after tapply() mixed them up
pr <- aperm(pr, c(2, 3, 1))

# Label the dimensions
dimnames(pr) <- list(nc\$dim\$lon\$vals, nc\$dim\$lat\$vals, levels(mon))
nc_close(nc)
``````

Note that you will now have a 3-d array with the 3rd dimension having 24 elements with labels such as "2021-01", "2021-02" ... "2022-12". The `apply()` function applied the function `tapply()` over the first two dimensions, with `tapply()` applying the `mean()` function using the factor `mon`, so in the result every grid cell has the monthly mean values for the two years.