pythongdalnetcdfnetcdf4

How to Specify Dimension Values when Creating NetCDF File in Python?


I am creating a NetCDF4 file which currently has four variables:

1) Land Surface Temperature (3D array - time, latitude, longitude)

2) Longitude (1D - coordinate of each pixel centre)

3) Latitude (1D - coordinate of each pixel centre)

4) Time (time of image acquisition in hours since 1900-01-01 00:00:00)

I am currently using the following code to do this:

    #==========================WRITE THE NETCDF FILE==========================#

    newfile = nc.Dataset(export_filename, 'w', format = 'NETCDF4_CLASSIC')


    #==========================SET FILE DIMENSIONS============================#

    newfile.createDimension('lat', ny)
    newfile.createDimension('lon', nx)
    newfile.createDimension('time', len(filenames))


    #==========================SET GLOBAL ATTRIBUTES==========================#

    newfile.title = ('Title')
    newfile.history = "File created on " + datetime.strftime(datetime.today(), "%c")
    newfile.Conventions = 'CF-1.6'



    #==========================CREATE DATA VARIABLES==========================#

    #--------------------------LST VARIABLE-----------------------------------#

    LSTs = newfile.createVariable('LST', np.int16, ('time', 'lat', 'lon'), fill_value = -8000)
    LSTs.units = 'Degrees C'
    LSTs.add_offset = 273.15
    LSTs.scale_factor = 0.01
    LSTs.standard_name = 'LST'
    LSTs.long_name = 'Land Surface Temperature'
    LSTs.grid_mapping = 'latitude_longitude'
    LSTs.coordinates = 'lon lat'

    LSTs[:] = LSTd[:]


    #--------------------------LON AND LAT AND TIME--------------------------#

    LONGITUDEs = newfile.createVariable('LONGITUDE', np.float64, ('lon',))
    LONGITUDEs.units = 'Decimal Degrees East'
    LONGITUDEs.standard_name = 'Longitude'
    LONGITUDEs.long_name = 'Longitude'
    LONGITUDEs[:] = LONd[:]

    LATITUDEs = newfile.createVariable('LATITUDE', np.float64, ('lat',))
    LATITUDEs.units = 'Decimal Degrees North'
    LATITUDEs.standard_name = 'Latitude'
    LATITUDEs.long_name = 'Latitude'
    LATITUDEs[:] = LATd[:]

    TIMEs = newfile.createVariable('TIME', np.int32, ('time',))
    TIMEs.units = 'hours since 1900-01-01 00:00:00'
    TIMEs.standard_name = 'Time'
    TIMEs.long_name = 'Time of Image Acquisition'
    TIMEs.axis = 'T'
    TIMEs.calendar = 'gregorian'
    TIMEs[:] = time[:]

    #--------------------------SAVE THE FILE---------------------------------#

    newfile.close();

This code produces a netCDF file with the land surface temperature variable having 24 bands (one for each hour of the day). This code works as I wanted it to albeit one small problem which I wish to address. When I run gdalinfo for the LST variable, I get (this is a reduced version):

Band 1.....
...
NETCDF_DIM_TIME = 1
...

I want this value of 1 to be set to the same as the 'time' variable (which is something like 1081451 hours since 1900-01-01 00:00:00) which I have included in my above code. I therefore want to understand how this can be changed for each band in the file?

UPDATE TO QUESTION: When I do gdalinfo on the file (again, a subset):

NETCDF_DIM_EXTRA={time}
NETCDF_DIM_time_DEF={24,3}

but there is an option missing 'NETCDF_DIM_time_VALUES' and I need to set this to the time variable and it should work. HOW DO I DO THIS?

At present it is just being set to the band number but I want it to contain information regarding its hour of acquisition.

UPDATE 1:

I have tried to specify

LSTs.NETCDF_DIM_Time = time

during the netCDF file formation and this has assigned all values from time to the NETCDF_DIM_TIME in gdal so that each band has 24 time values rather than just one.

UPDATE 2:

With some further digging I think it is the NETCDF_DIM_time_VALUES metadata which needs to be set to the 'time' variable. I have updated my question to ask how to do this.


Solution

  • The variables associated with the dimensions should have the same name as the dimensions. So in your code above replace the create variable line with:

    TIMEs = newfile.createVariable('time', np.int32, ('time',))
    

    now gdalinfo knows where to find the data. I ran your code using dummy times [1000000, 1000024] and gdal info returns:

    Band1...
    ...
    NETCDF_DIM_time=1000000    
    ...
    Band2...
    ...
    NETCDF_DIM_time=1000024
    ...
    

    To answer your title question: You can't assign values to a Dimension but you can have a variable with the same name as the dimension that holds the data/values associated with the dimension. Readers of netcdf files, like gdal, look for conventions like this to interpret the data. See for example Unidata's 'Writing NetCDF Files: Best Practices' 'Coordinate Systems'