pythonpython-xarraynetcdfnetcdf4

How to restructure this netCDF?


UPDATE: I uploaded the file to dropbox and can be downloaded via this link (I hope this works, I don't use dropbox often): https://www.dropbox.com/scl/fi/vd0s9g080m8h9fxh7rn9l/IASISND02_20240702161759Z_20240702175655Z_epct_d9f95b34_F.nc?rlkey=isrwelpr9abbqswr91unrhpkp&st=6hkb5u2l&dl=0


I have downloaded a netCDF-dataset which I read in with xarray, but is has a relatively complex structure and I'm not able to grasp how to bring it in the general format as (time, lat, lon), so that I can plot the values of the relevant variable on a map.

I will post a shorter version and in the lower section I add additional information to the variables, which is more or less printing the information of all the variables, which I think are relevant to bring the data into the general structure (time, lat, long).

The short version:

I open the netcdf-file with xarray

netcdf = xr.open_dataset(
    'path/tofile/IASISND02_20240702161759Z_20240702175655Z_epct_d9f95b34_F.nc'
)

This file has 77 variables, and I'm interested in the values of the variable integrated_co2:

print(netcdf.integrated_co2)

enter image description here

The coordinates are no coordinates but seem to be the indices to the coordinates, which are stored in two variables in the dataset, with the names 'lat' and 'lon'. Further, there are also two time variables. The shapes of the variables, which I think are relevant are:

# Relevant variables in the netcdf-file:
var_names = [
    "lat",
    "lon",
    "record_start_time",
    "record_stop_time",
    "across_track",
    "along_track",
    "integrated_co2",
]

for var in var_names:
    print(f"{var}: {netcdf[var].shape} ")

enter image description here

The lat and lon variables contain coordinates in degrees, along_track contains numbers from 1 to 742, across_track contains numbers from 1 to 120, and the time variables contain numbers in the form 7.732523e+08.

I think, everything is there, the coordinates, the timestamp, the co2-values, but it is kind of confusing, that the coordinates in its actual form are the indices to the actual coordinates. At least this is, what it seems to be here?

How can I bring this variables into the general structure of (time,lat,lon), so that I'm able to plot the data of 'integrated_co2' on a map?

If there is someone who knows how to restructure this data I would be very happy. I'm a beginner and I read already quite a bit about how to use the data, but haven't found an example like it.


Here are some additional information to the relevant variables:

print(netcdf.along_track)

enter image description here

print(netcdf.across_track)

enter image description here

print(netcdf.lat)

enter image description here

print(netcdf.lon)

enter image description here

print(netcdf.record_start_time)

enter image description here

print(netcdf.record_stop_time)

enter image description here


Solution

  • This is a tricky one because the data is from a satellite. The data from EUMETSAT IASI 2b the Infrared Atmospheric Sounding Interferometer (see the attributes of the netcdf) is defined by the path of the satellite.

    The along-track and across-track dimensions describe the location relative to the satellite (see a cool illustration here).

    You can plot the data without reformatting it. You may use matplotlib.pyplot.pcolormesh, matplotlib.pyplot.contour, or matplotlib.pyplot.contourf to plot this data.

    Here is an example using matplotlib.pyplot.pcolormesh.

    import xarray as xr
    import matplotlib.pyplot as plt
    import cartopy.crs as ccrs
    import numpy as np
    
    ds = xr.open_dataset('path/tofile/IASISND02_20240702161759Z_20240702175655Z_epct_d9f95b34_F.nc')
    
    fig = plt.figure()
    ax = plt.axes(projection=ccrs.PlateCarree(central_longitude=0))
    ax.set_global()
    
    # Note, I adjusted the central_longitude to avoid strange plotting
    ax.pcolormesh(ds.lon, ds.lat, ds.integrated_co2, 
                  transform =ccrs.PlateCarree(central_longitude=90))
    ax.coastlines(transform =ccrs.PlateCarree(central_longitude=90))
    
    

    output_image

    You can convert the record_start_time etc to a datetime object:

    from datetime import datetime
    
    # This reference date adjusts the date so it matches ds.start_sensing_time attribute.
    reference_date = 30*365.25*24*60*60
    record_start_times = [datetime.fromtimestamp(reference_date + time) \
     for time in ds.record_start_time.values]