pythonrarraysnetcdf

How to create an array of length n in R where each element has two elements (example given)


I would like to create an array in R that looks like the following array made in Python. This may be a very simple question, but it's giving me trouble!

array([[19358, 19388],
       [19389, 19416],
       [19417, 19447],
       [19448, 19477],
       [19478, 19508],
       [19509, 19538],
       [19539, 19569],
       [19570, 19600],
       [19601, 19630],
       [19631, 19661],
       [19662, 19691],
       [19692, 19722]])

EDIT: To add some context, I am trying to put a variable of depths (lower and upper values of layers in the atmosphere) into a netcdf dimension using the ncdf4 package in R. It seems like giving this type of array is needed in order to do this.

https://nordatanet.github.io/NetCDF_in_Python_from_beginner_to_pro/09_cells_and_cell_methods.html


Solution

  • You are working with netCDF data so you should indeed generate a matrix of data to put into the bounds of the dimension variable. However, the data example that you give is for monthly time data, in which case you should use the CFtime package:

    library(CFtime)
    
    # day1 is the first day of every month for the year 2023 + 2024-01-01
    day1 <- CFtime("days since 1970-01-01",
                   "standard",
                   as.character(seq.Date(as.Date("2023-01-01"), as.Date("2024-01-01"), "1 month")))
    
    # Get the offsets from the epoch
    offsets <- day1$offsets
    
    # Turn that into a bounds matrix
    time_bnds <- rbind(offsets[1:12], offsets[2:13])
    time_bnds
    #>      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
    #> [1,] 19358 19389 19417 19448 19478 19509 19539 19570 19601 19631 19662 19692
    #> [2,] 19389 19417 19448 19478 19509 19539 19570 19601 19631 19662 19692 19723
    
    # The corresponding CFTime object, should you need it:
    months <- CFtime("days since 1970-01-01",
                     "standard",
                     offsets[1:12] + diff(offsets) * 0.5)
    bounds(months) <- TRUE
    months
    #> CF calendar:
    #>   Origin  : 1970-01-01 00:00:00
    #>   Units   : days
    #>   Type    : standard
    #> Time series:
    #>   Elements: [2023-01-16 12:00:00 .. 2023-12-16 12:00:00] (average of 30.363636 days between 12 elements)
    #>   Bounds  : regular and consecutive
    
    # Below offsets are the midpoints between your bounds from above
    months$offsets
    #> [ 1] 19373.5 19403.0 19432.5 19463.0 19493.5 19524.0 19554.5 19585.5 19616.0
    #> [10] 19646.5 19677.0 19707.5
    

    This is a lot more complicated than some other answers that you may get but it works in all conditions (such as calendars with different lengths of the months, as happens often with netCDF data).

    Also interesting to note, this matrix is actually the way it should be in R: 2 rows and as many columns as there are data points along the dimension. The change from Python is due to the fact that R uses column-major ordering of matrices and arrays, as opposed to Python's row-major ordering.

    Atmosphere levels

    But you were asking about atmosphere levels. If you have a list of levels then you can create the bnds matrix quite easily, assuming that the levels form the bounds between atmospheric layers:

    levels <- c(1000, 950, 900, 800, 700, 500, 300, 100, 50)
    len <- length(levels)
    bnds <- rbind(levels[-len], levels[-1])
    bnds
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
    [1,] 1000  950  900  800  700  500  300  100
    [2,]  950  900  800  700  500  300  100   50
    

    If, on the other hand, you want to use your known levels as "mid-points" then you should define the lower and upper pressure levels to use as boundary values yourself: typically pressure levels are not evenly spaced. Once you have done that, proceed as above.