I have two netCDF files:
WATCH data:
> watch_Tair:
2 variables (excluding dimension variables):
int timestp[tstep]
title: time steps
units: time steps (days) since 2018-01-01 00:00:00
long_name: time steps (days) since start of month
float Tair[lon,lat,tstep]
title: Tair average
units: K
long_name: Average Near surface air temperature
at 2 m at time stamp
actual_max: 314.173614501953
actual_min: 212.381393432617
_FillValue: 1.00000002004088e+20
4 dimensions:
lon Size:720
title: Longitude
units: grid box centre degrees_east
actual_max: 179.75
actual_min: -179.75
lat Size:360
title: Latitude
units: grid box centre degrees_north
actual_max: 89.75
actual_min: -89.75
tstep Size:31 *** is unlimited *** (no dimvar)
day Size:31
title: day
units: integer
long_name: day of the month
ERA5 data:
> era5_Tair:
1 variables (excluding dimension variables):
short t2m[longitude,latitude,time]
scale_factor: 0.00169511169020999
add_offset: 265.715689919741
_FillValue: -32767
missing_value: -32767
units: K
long_name: 2 metre temperature
3 dimensions:
longitude Size:3600
units: degrees_east
long_name: longitude
latitude Size:1801
units: degrees_north
long_name: latitude
time Size:744
units: hours since 1900-01-01 00:00:00.0
long_name: time
calendar: gregorian
The aim is for ERA5 to match WATCH data structure. I do some pre-processing on the ERA5 file to get the same spatial and temporal aggregation as for WATCH data:
#*Daily aggregation*
era5_Tair_daily <- tapp(era5_Tair, "days", mean)
#`Spatial aggregation`
era5_temp_resampled <- terra::resample(era5_Tair_daily, watch_Tair, method = "bilinear")
Then, I save it back as a netcd file:
writeCDF(era5_temp_resampled, filename = "./era5_temp.nc", varname = "Tair", unit="K", zname="time", overwrite = TRUE)
> era5_temp_resample:
2 variables (excluding dimension variables):
float Tair[longitude,latitude,time] (Contiguous storage)
units: K
_FillValue: -1.17549402418441e+38
long_name: Average Near surface air temperature at 2 m at time stamp
grid_mapping: crs
int crs[] (Contiguous storage)
crs_wkt: GEOGCRS["unknown", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84",6378137,298.257223563, LENGTHUNIT["metre",1]], ID["EPSG",6326]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8901]], CS[ellipsoidal,2], AXIS["longitude",east, ORDER[1], ANGLEUNIT["degree",0.0174532925199433, ID["EPSG",9122]]], AXIS["latitude",north, ORDER[2], ANGLEUNIT["degree",0.0174532925199433, ID["EPSG",9122]]]]
spatial_ref: GEOGCRS["unknown", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84",6378137,298.257223563, LENGTHUNIT["metre",1]], ID["EPSG",6326]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8901]], CS[ellipsoidal,2], AXIS["longitude",east, ORDER[1], ANGLEUNIT["degree",0.0174532925199433, ID["EPSG",9122]]], AXIS["latitude",north, ORDER[2], ANGLEUNIT["degree",0.0174532925199433, ID["EPSG",9122]]]]
proj4: +proj=longlat +datum=WGS84 +no_defs
geotransform: -80 0.5 0 15 0 -0.5
3 dimensions:
longitude Size:100
units: degrees_east
long_name: longitude
latitude Size:70
units: degrees_north
long_name: latitude
time Size:31
units: days since 1970-1-1
long_name: time
calendar: standard
However, in the watch data, the nc file has as variables Tair + timestp and ERA5 (Tair + crs). How can I get the exact same structure?
The short answer is: you can't.
At least not with the terra
package. Insofar as I know, terra
will always add a crs
variable (a so-called grid mapping variable) because georeferencing data is its bread-and-butter. In this specific case the crs
variable is not required by the CF Metadata Conventions (that all climate and weather datasets should reasonably adhere too) because the horizontal coordinates are in longitude and latitude. The question, however, is: why bother? It certainly doesn't impact the Tair
variable, even though it now has a (required) attribute grid_mapping: crs
.
terra
also has no way of "knowing" that you want variable timestp
in your output file and quite possibly no way of doing that at all (since it is not spatial data). If you want to copy this variable over then you should use a low-level netCDF package like RNetCDF
or ncdf4
to lift the variable out of your WATCH data and then copy it into the new file.
In general, your WATCH data has multiple deficiencies from the viewpoint of the CF Metadata Conventions. These conventions were designed exactly to support the kind of things you want to do here: multiple inputs, combine, process, store. You should inform the WATCH data provider that they should make their data compliant with the conventions. It will make your life a lot easier because most software that can process netCDF data assumes compliance with the CF or some other set of conventions.