rmemorynetcdf

Reading netCDF file with data not on a grid


I am trying to convert portions of a netCDF (.nc) file into a .csv and I am having some issues when I try and combine the portions I separated out into one matrix. This is the first time I have ever worked with a data file of this type and I feel I may be handling something incorrectly. The portion I am combining first is not all of what I need so it's a bit worrying-I still need to add two variables on after this combination. The file can be found here https://www.ncei.noaa.gov/thredds-ocean/catalog/aoml/tsg/2018/RT_QC_RAW_nc/catalog.html?dataset=aoml/tsg/2018/RT_QC_RAW_nc/WTDO_2018_08_04.nc

The downloaded file is 2178KB and the separated data all have 24905 rows (each has one column before combined) and are each 199456 bytes. I have a total of 4 columns like this-only three are being combined in the as.matrix command. Also when I clear all the non-used data and values and close everything else on the computer it seems I still am unable to free enough. The memory usage report says the objects are using 227 MiB, the session is using 433 MiB and my whole system is using ~10000 MiB and says I have 5975 MiB free memory (this is about what it is when I have everything except R closed). The computer has 16GB of RAM and am running 64 bit.

Here is the code I am using to separate the pieces I need (I found this on a tutorial) I'm using the ncdf4 package and lubridate package to convert the date.

#grab some data
dim_lon <- ncvar_get(nc_ds, "LON")
dim_lat <- ncvar_get(nc_ds, "LAT")
dim_time <- ncvar_get(nc_ds, "Time")

#convert time 
t_units <- ncatt_get(nc_ds, "Time", "units")
t_ustr <- strsplit(t_units$value, " ")
t_dstr <- strsplit(unlist(t_ustr)[3], "-")
date <- ymd(t_dstr) + dseconds(dim_time)
date

#make coordinate matrix
coords <- as.matrix(expand.grid(dim_lon, dim_lat, date))

The making a coordinate matrix gives me this error Error: cannot allocate vector of size 57546.6 Gb

I have run these as well (code and answers) based off of other forum question answers and the info page for the code. I'm very confused about what it means and how to solve this issue-any help would be appreciated.

if(.Platform$OS.type == "windows") withAutoprint({
+     memory.size()
+     memory.size(TRUE)
+     memory.limit(size=500000)
+ })

memory.size() [1] 342.65 memory.size(TRUE) [1] 2541.38 memory.limit(size = 5e+05) [1] 1e+09

I have also tried increasing the memory size based off of forum question answers (the first thing I tried to do) and I believe I may have, but it has not solved my issue and when I repeat that command it seems I may have reached the maximal limit since it gives me this warning.

Warning message: In memory.limit(size = 5e+05) : cannot decrease memory limit: ignored

I think this is a fairly simple thing (combining columns of data) even though the columns are large-I don't quite understand why it needs to much.

I hope that's enough info, but please let me know if more is needed.

Thanks, Megan


Solution

  • Your data has thermosalinograph data from moving ships, in this case the "Oregon II". These data are trajectories, not grids, so the tutorial you found online is no good.

    With packages ncdfCF and CFtime I get the following:

    library(ncdfCF)
    library(CFtime)
    
    url <- "https://www.ncei.noaa.gov/thredds-ocean/dodsC/aoml/tsg/2018/RT_QC_RAW_nc/WTDO_2018_08_04.nc"
    ds <- open_ncdf(url)
    
    # List the data variables in the netCDF file
    (vars <- names(ds))
    #>  [1] "Time" "LAT"  "LON"  "INT"  "SAL"  "COND" "EXT"  "SST"  "A"    "B"   
    #> [11] "C"    "D"    "E"    "F"    "G"    "H"    "I"    "J"    "K"    "L"
    
    # Let's have a look at the "Time" data variable:
    ds[["Time"]]
    #> <Variable> Time 
    #> Long name: time 
    #> 
    #> Axes:
    #>  id name    length values       
    #>  0  records 10409  [1 ... 10409]
    #> 
    #> Attributes:
    #>  id name       type    length value                            
    #>  0  units      NC_CHAR 33     seconds since 1950-01-01 00:00:00
    #>  1  instrument NC_CHAR  3     GPS                              
    #>  2  long_name  NC_CHAR  4     time
    

    A couple of things to note in the above:

    To get all the data into a data.frame, which is easily exported to a CSV file, you need a little more code:

    # Loop over the data variables
    data <- lapply(vars, function(v) {
      # Get the data variable
      dv <- ds[[v]]
      # The actual data
      values <- dv$data()$raw()
      # Use the "units" attribute to convert any time coordinates from offsets to a `POSIXct`. If that fails, just return the values
      units <- dv$attribute("units")
      if (is.na(units) || inherits(t <- try(CFtime::CFtime(units, "standard", values), silent = TRUE), "try-error"))
        values
      else
        t$as_timestamp(asPOSIX = TRUE)
    })
    
    # Convert into a data.frame
    data <- as.data.frame(data, col.names = vars)
    head(data)
    #>                  Time LAT LON     INT    SAL COND EXT    SST A B C D E F G H I
    #> 1 2018-08-22 00:00:09 NaN NaN 303.407 33.705   NA NaN 304.05 1 1 0 0 0 1 1 1 1
    #> 2 2018-08-22 00:00:39 NaN NaN 303.389 33.704   NA NaN 304.05 1 1 0 0 0 1 1 1 1
    #> 3 2018-08-22 00:01:09 NaN NaN 303.387 33.701   NA NaN 304.05 1 1 0 0 0 1 1 1 1
    #> 4 2018-08-22 00:01:39 NaN NaN 303.375 33.706   NA NaN 304.05 1 1 0 0 0 1 1 1 1
    #> 5 2018-08-22 00:02:09 NaN NaN 303.351 33.712   NA NaN 303.95 1 1 0 0 0 1 1 1 1
    #> 6 2018-08-22 00:02:39 NaN NaN 303.335 33.710   NA NaN 303.95 1 1 0 0 0 1 1 1 1
    #>   J K L
    #> 1 1 0 0
    #> 2 1 0 0
    #> 3 1 0 0
    #> 4 1 0 0
    #> 5 1 0 0
    #> 6 1 0 0