I am trying to convert portions of a netCDF (.nc) file into a .csv and I am having some issues when I try and combine the portions I separated out into one matrix. This is the first time I have ever worked with a data file of this type and I feel I may be handling something incorrectly. The portion I am combining first is not all of what I need so it's a bit worrying-I still need to add two variables on after this combination. The file can be found here https://www.ncei.noaa.gov/thredds-ocean/catalog/aoml/tsg/2018/RT_QC_RAW_nc/catalog.html?dataset=aoml/tsg/2018/RT_QC_RAW_nc/WTDO_2018_08_04.nc
The downloaded file is 2178KB and the separated data all have 24905 rows (each has one column before combined) and are each 199456 bytes. I have a total of 4 columns like this-only three are being combined in the as.matrix command. Also when I clear all the non-used data and values and close everything else on the computer it seems I still am unable to free enough. The memory usage report says the objects are using 227 MiB, the session is using 433 MiB and my whole system is using ~10000 MiB and says I have 5975 MiB free memory (this is about what it is when I have everything except R closed). The computer has 16GB of RAM and am running 64 bit.
Here is the code I am using to separate the pieces I need (I found this on a tutorial) I'm using the ncdf4 package and lubridate package to convert the date.
#grab some data
dim_lon <- ncvar_get(nc_ds, "LON")
dim_lat <- ncvar_get(nc_ds, "LAT")
dim_time <- ncvar_get(nc_ds, "Time")
#convert time
t_units <- ncatt_get(nc_ds, "Time", "units")
t_ustr <- strsplit(t_units$value, " ")
t_dstr <- strsplit(unlist(t_ustr)[3], "-")
date <- ymd(t_dstr) + dseconds(dim_time)
date
#make coordinate matrix
coords <- as.matrix(expand.grid(dim_lon, dim_lat, date))
The making a coordinate matrix gives me this error Error: cannot allocate vector of size 57546.6 Gb
I have run these as well (code and answers) based off of other forum question answers and the info page for the code. I'm very confused about what it means and how to solve this issue-any help would be appreciated.
if(.Platform$OS.type == "windows") withAutoprint({
+ memory.size()
+ memory.size(TRUE)
+ memory.limit(size=500000)
+ })
memory.size() [1] 342.65 memory.size(TRUE) [1] 2541.38 memory.limit(size = 5e+05) [1] 1e+09
I have also tried increasing the memory size based off of forum question answers (the first thing I tried to do) and I believe I may have, but it has not solved my issue and when I repeat that command it seems I may have reached the maximal limit since it gives me this warning.
Warning message: In memory.limit(size = 5e+05) : cannot decrease memory limit: ignored
I think this is a fairly simple thing (combining columns of data) even though the columns are large-I don't quite understand why it needs to much.
I hope that's enough info, but please let me know if more is needed.
Thanks, Megan
Your data has thermosalinograph data from moving ships, in this case the "Oregon II". These data are trajectories, not grids, so the tutorial you found online is no good.
With packages ncdfCF
and CFtime
I get the following:
library(ncdfCF)
library(CFtime)
url <- "https://www.ncei.noaa.gov/thredds-ocean/dodsC/aoml/tsg/2018/RT_QC_RAW_nc/WTDO_2018_08_04.nc"
ds <- open_ncdf(url)
# List the data variables in the netCDF file
(vars <- names(ds))
#> [1] "Time" "LAT" "LON" "INT" "SAL" "COND" "EXT" "SST" "A" "B"
#> [11] "C" "D" "E" "F" "G" "H" "I" "J" "K" "L"
# Let's have a look at the "Time" data variable:
ds[["Time"]]
#> <Variable> Time
#> Long name: time
#>
#> Axes:
#> id name length values
#> 0 records 10409 [1 ... 10409]
#>
#> Attributes:
#> id name type length value
#> 0 units NC_CHAR 33 seconds since 1950-01-01 00:00:00
#> 1 instrument NC_CHAR 3 GPS
#> 2 long_name NC_CHAR 4 time
A couple of things to note in the above:
CFtime
to turn that into a POSIXct
date-time object. Interestingly, the attribute "instrument" has a value of "GPS", so is this GPS time (18 seconds ahead of UTC currently) or regular (UTC?) time derived from a GPS instrument? That is not clear from the data.To get all the data into a data.frame
, which is easily exported to a CSV file, you need a little more code:
# Loop over the data variables
data <- lapply(vars, function(v) {
# Get the data variable
dv <- ds[[v]]
# The actual data
values <- dv$data()$raw()
# Use the "units" attribute to convert any time coordinates from offsets to a `POSIXct`. If that fails, just return the values
units <- dv$attribute("units")
if (is.na(units) || inherits(t <- try(CFtime::CFtime(units, "standard", values), silent = TRUE), "try-error"))
values
else
t$as_timestamp(asPOSIX = TRUE)
})
# Convert into a data.frame
data <- as.data.frame(data, col.names = vars)
head(data)
#> Time LAT LON INT SAL COND EXT SST A B C D E F G H I
#> 1 2018-08-22 00:00:09 NaN NaN 303.407 33.705 NA NaN 304.05 1 1 0 0 0 1 1 1 1
#> 2 2018-08-22 00:00:39 NaN NaN 303.389 33.704 NA NaN 304.05 1 1 0 0 0 1 1 1 1
#> 3 2018-08-22 00:01:09 NaN NaN 303.387 33.701 NA NaN 304.05 1 1 0 0 0 1 1 1 1
#> 4 2018-08-22 00:01:39 NaN NaN 303.375 33.706 NA NaN 304.05 1 1 0 0 0 1 1 1 1
#> 5 2018-08-22 00:02:09 NaN NaN 303.351 33.712 NA NaN 303.95 1 1 0 0 0 1 1 1 1
#> 6 2018-08-22 00:02:39 NaN NaN 303.335 33.710 NA NaN 303.95 1 1 0 0 0 1 1 1 1
#> J K L
#> 1 1 0 0
#> 2 1 0 0
#> 3 1 0 0
#> 4 1 0 0
#> 5 1 0 0
#> 6 1 0 0