I'm trying to load some climate data from PAVICS with thredds in R.
library(ncdf4)
library(thredds)
url = "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/datasets/simulations/bias_adjusted/cmip6/ouranos/ESPO-G/ESPO-G6-R2v1.0.0/derived/ensemble_members/catalog.xml"
# Create Catalog
cat = thredds::get_catalog(url) # Seems to be the same thredds::CatalogNode$new(url)
# Check the catalog
cat$browse()
# Get the datasets
catds = cat$get_datasets()
# Verify one ncml file
data.test = catds$`annual_ssp585_ESPO-G6-R2v1.0.0._climindices_ensemble_members.ncml`
# Try to open the file
ncdf4::nc_open(paste0("https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/", data.test$url))
But I get a file not found error
Error in R_nc4_open: NetCDF: file not found
Error in ncdf4::nc_open(paste0("https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/", :
Error in nc_open trying to open file https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/datasets/simulations/bias_adjusted/cmip6/ouranos/ESPO-G/ESPO-G6-R2v1.0.0/derived/ensemble_members/annual_ssp585_ESPO-G6-R2v1.0.0._climindices_ensemble_members.ncml (return_on_error= FALSE )
The base URL for the file is not correct. You have taken the URL from the catalog path but the actual data (for OPENDaP access) is under the dodsC
branch. So you should do this:
base_dodsC <- "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/"
nc <- ncdf4::nc_open(paste0(base_dodsC, data.test$url))
There are actually multiple paths to the data, depending on what tool you use to access the data. All of these have a specific base URL. Have a look here.
Unfortunately, the thredds
library does not standardise how to construct URLs so it's a matter of trial-and-error.
Note also that you are accessing an NcML
file, which describes a data set but which does not contain the data. In effect, the NcML file contains the metadata that you would read when you ncdf4::nc_open()
a file. The NcML file itself is an XML document that contains references to the underlying data, possibly distributed over multiple files. You can use the usual ncdf4
syntax to discover file contents and access the data:
names(nc$var)
# [1] "rotated_pole" "lat"
# [3] "lon" "dlyfrzthw"
# [5] "growing_degree_days_gt_4" "heat_wave_frequency_class1"
# [7] "heat_wave_frequency_class2" "heat_wave_frequency_class3"
# [9] "heat_wave_total_length_class1" "heat_wave_total_length_class2"
# [11] "heat_wave_total_length_class3" "liquidprcptot"
# [13] "prcptot" "rx5day"
# [15] "solidprcptot" "tg_mean"
# [17] "tn_days_above_20" "tn_days_above_22"
# [19] "tn_mean" "tx_days_above_30"
# [21] "tx_days_above_32" "tx_mean"
# [23] "lakeFrac" "sftlf"
# [25] "sftof"
lat <- ncvar_get(nc, "lat")
str(lat)
# num [1:706, 1:800] 5.76 5.79 5.82 5.85 5.88 ...
The regular variables I looked at are each about 8.9GB in size so you'll want to subset before you download. That is 800 latitude x 706 longitude in a rotated pole grid, for 151 time steps (1950 - 2100), by 26 realizations (= GCM model runs). These realizations are:
> pavics[["realization"]]$values
[1] "TaiESM1:r1i1p1f1" "BCC-CSM2-MR:r1i1p1f1" "FGOALS-g3:r1i1p1f1"
[4] "CanESM5:r1i1p1f1" "CMCC-ESM2:r1i1p1f1" "CNRM-CM6-1:r1i1p1f2"
[7] "CNRM-ESM2-1:r1i1p1f2" "ACCESS-ESM1-5:r1i1p1f1" "ACCESS-CM2:r1i1p1f1"
[10] "MPI-ESM1-2-HR:r1i1p1f1" "EC-Earth3:r1i1p1f1" "EC-Earth3-CC:r1i1p1f1"
[13] "EC-Earth3-Veg:r1i1p1f1" "INM-CM4-8:r1i1p1f1" "INM-CM5-0:r1i1p1f1"
[16] "IPSL-CM6A-LR:r1i1p1f1" "MIROC-ES2L:r1i1p1f2" "MIROC6:r1i1p1f1"
[19] "UKESM1-0-LL:r1i1p1f2" "MPI-ESM1-2-LR:r1i1p1f1" "MRI-ESM2-0:r1i1p1f1"
[22] "NorESM2-LM:r1i1p1f1" "NorESM2-MM:r1i1p1f1" "KACE-1-0-G:r1i1p1f1"
[25] "GFDL-ESM4:r1i1p1f1" "NESM3:r1i1p1f1"
As per your comment, you want to extract data for Québec province in Canada. There are a few obstacles you have to overcome, after selecting which variable and model/realization you want to use. Let's first see what's in the netCDF resource. For this I am using package ncdfCF
, in its development version on GitHub.
pavics <- ncdf_open(paste0(base_dodsC, data.test$url))
tg <- pavics[["tg_mean"]]
tg
# <Variable> tg_mean
# Long name: Mean daily mean temperature
#
# Axes:
# id axis name long_name length values unit
# 3 X rlon longitude in rotated pole grid 706 [324.60278 ... 388.0528] degrees
# 2 Y rlat latitude in rotated pole grid 800 [-46.17 ... 25.74] degrees
# 4 T time 151 [1950-01-01 ... 2100-01-01] days since 1950-01-01
# 1 realization 26 [TaiESM1:r1i1p1f1, BCC-CSM2-MR:r1i1p1f1, FGOALS...
# rotated_pole coordinates of the rotated North Pole 1 [9.96920996838687e+36]
#
# Auxiliary longitude-latitude grid:
# orientation axis name extent unit
# longitude X lon [-179.992 ... 179.973] degrees_east
# latitude Y lat [5.756 ... 83.982] degrees_north
#
# Attributes:
# (omitted for brevity)
The principal problem is that the data is in a so-called rotated_pole
grid. While the axes are in units of "degrees", this is on an angle to the standard latitude-longitude grid and the values are off, as you have already observed. On top of that, the data is organised differently from what R expects, so if you quickly plot some data (using the terra
package here) you'll get some surprising results:
data <- tg[,,1,1] # Full geographical extent, first time slice, first realization
r <- terra::rast(data)
plot(r)
Apart from North America lying on its side, the distortion in Nunavut should be immediately clear to you. You need to correct for this problem with the rotation and the distortion, which you can do with ncdfCF
while subsetting the data to your area of interest:
# Subset data for Québec, all time steps, one realization
data <- tg$subset(list(X = -79:-56, Y = 44:63, realization = "TaiESM1:r1i1p1f1"))
r <- data$terra() # Convert the ncdfCF object to a terra:SpatRaster
plot(r)
And there you are. You can also do data$array()
to get the values as an array, properly oriented and with dimension names.
ncdfCF
being a very new package, you might run into trouble. If so, open an issue.