pythonmasknetcdfcdo-climatencl

extract data from netcdf file contained within a shapefile's boundaries


I have the following shapefile and netcdf file.

I would like to extract data from the netcdf file that are contained within the boundaries of the shapefile.

Do you have any suggestion on how I can achieve this?

The shapefile corresponds to SREX region 11 North Europe (NEU) and the netcdf file is an example of CMIP6 climate model data output (ua variable). My desired output has to be in netcdf format.


Update

So far I tried to create a netcdf mask using NCL and CDO, and apply this mask to the original netcdf dataset. Here below the steps (and NCL scripts):

#################
## remove plev dimension from netcdf file
cdo --reduce_dim -copy nc_file.nc nc_file2.nc

## convert longitude to -180, 180
cdo sellonlatbox,-180,180,-90,90 nc_file2.nc nc_file3.nc

## create mask 
ncl create_nc_mask.ncl

## apply mask
cdo div nc_file3.nc shape1_mask.nc nc_file4.nc 
#################

The output is almost correct. See picture below. But the southern boundaries of the shapefile (SREX 11, NEU) are not captured correctly. So I suppose there is something wrong in the NCL script that generates the netcdf mask.

enter image description here


Solution

  • Re-using some old scripts/code, I quickly came up with this for a Python solution. It basically just loops over all grid points, and checks whether each grid point is inside or outside the polygon from the shape file. The result is the variable mask (array with True/False), which can be used to mask your NetCDF variables.

    Note: this uses Numba (all the @jit lines) to accelerate the code, although that is not really necessary in this case. You can just comment them out if you don't have Numba.

    import matplotlib.pyplot as pl
    import netCDF4 as nc4
    import numpy as np
    import fiona
    from numba import jit
    
    @jit(nopython=True, nogil=True)
    def distance(x1, y1, x2, y2):
        """
        Calculate distance from (x1,y1) to (x2,y2)
        """
        return ((x1-x2)**2 + (y1-y2)**2)**0.5
    
    @jit(nopython=True, nogil=True)
    def point_is_on_line(x, y, x1, y1, x2, y2):
        """
        Check whether point (x,y) is on line (x1,y1) to (x2,y2)
        """
    
        d1 = distance(x,  y,  x1, y1)
        d2 = distance(x,  y,  x2, y2)
        d3 = distance(x1, y1, x2, y2)
    
        eps = 1e-12
        return np.abs((d1+d2)-d3) < eps
    
    @jit(nopython=True, nogil=True)
    def is_left(xp, yp, x0, y0, x1, y1):
        """
        Check whether point (xp,yp) is left of line segment ((x0,y0) to (x1,y1))
        returns:  >0 if left of line, 0 if on line, <0 if right of line
        """
    
        return (x1-x0) * (yp-y0) - (xp-x0) * (y1-y0)
    
    @jit(nopython=True, nogil=True)
    def is_inside(xp, yp, x_set, y_set, size):
        """
        Given location (xp,yp) and set of line segments (x_set, y_set), determine
        whether (xp,yp) is inside polygon.
        """
    
        # First simple check on bounds
        if (xp < x_set.min() or xp > x_set.max() or yp < y_set.min() or yp > y_set.max()):
            return False
    
        wn = 0
        for i in range(size-1):
    
            # Second check: see if point exactly on line segment:
            if point_is_on_line(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]):
                return False
    
            # Calculate winding number
            if (y_set[i] <= yp):
                if (y_set[i+1] > yp):
                    if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) > 0):
                        wn += 1
            else:
                if (y_set[i+1] <= yp):
                    if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) < 0):
                        wn -= 1
    
        if wn == 0:
            return False
        else:
            return True
    
    @jit(nopython=True, nogil=True)
    def calc_mask(mask, lon, lat, shp_lon, shp_lat):
        """
        Calculate mask of grid points which are inside `shp_lon, shp_lat`
        """
    
        for j in range(lat.size):    
            for i in range(lon.size):
                if is_inside(lon[i], lat[j], shp_lon, shp_lat, shp_lon.size):
                    mask[j,i] = True
    
    
    if __name__ == '__main__':
    
        # Selection of time and level:
        time = 0
        plev = 0
    
        # Read NetCDF variables, shifting the longitudes
        # from 0-360 to -180,180, like the shape file:
        nc = nc4.Dataset('nc_file.nc')
        nc_lon = nc.variables['lon'][:]-180.
        nc_lat = nc.variables['lat'][:]
        nc_ua  = nc.variables['ua'][time,plev,:,:]
    
        # Read shapefile and first feature
        fc = fiona.open("shape1.shp")
        feature = next(iter(fc))
    
        # Extract array of lat/lon coordinates:
        coords = feature['geometry']['coordinates'][0]
        shp_lon = np.array(coords)[:,0]
        shp_lat = np.array(coords)[:,1]
    
        # Calculate mask
        mask = np.zeros_like(nc_ua, dtype=bool)
        calc_mask(mask, nc_lon, nc_lat, shp_lon, shp_lat)
    
        # Mask the data array
        nc_ua_masked = np.ma.masked_where(~mask, nc_ua)
    
        # Plot!
        pl.figure(figsize=(8,4))
        pl.subplot(121)
        pl.pcolormesh(nc_lon, nc_lat, nc_ua, vmin=-40, vmax=105)
        pl.xlim(-20, 50)
        pl.ylim(40, 80)
    
        pl.subplot(122)
        pl.pcolormesh(nc_lon, nc_lat, nc_ua_masked, vmin=-40, vmax=105)
        pl.xlim(-20, 50)
        pl.ylim(40, 80)
    
        pl.tight_layout()
    

    enter image description here

    EDIT

    To write the mask to NetCDF, something like this can be used:

    nc_out = nc4.Dataset('mask.nc', 'w')
    nc_out.createDimension('lon', nc_lon.size)
    nc_out.createDimension('lat', nc_lat.size)
    
    nc_mask_out = nc_out.createVariable('mask', 'i2', ('lat','lon'))
    nc_lon_out = nc_out.createVariable('lon', 'f8', ('lon'))
    nc_lat_out = nc_out.createVariable('lat', 'f8', ('lat'))
    
    nc_mask_out[:,:] = mask[:,:]  # Or ~mask to reverse it
    nc_lon_out[:] = nc_lon[:]     # With +180 if needed
    nc_lat_out[:] = nc_lat[:]
    
    nc_out.close()