pythonscipyinterpolationcubic

How can I interpolate 2-D gridded data onto an array of "unstructured" 2-D coordinates with method='cubic'?


Context:

As interpolation-data (points and values) I have a 2-D 1x1 degree lat-,lon- grid, with a scalar-variable on each gridpoint. Large areas of the grid consists of NaNs (~40%). As coordinates (xi), I have an array of "unstructured" coordinates, all of which are inside of the grids bounds, and very few coordinates (<0.1%) are located near NaNs.

Usually, when linearly interpolating, I use scipy.interpolate.RegularGridInterpolator(points , values, method='linear') when I interpolate gridded data to unstructured coordinates. Followed by a subsequent nearest neighbor interpolation for the remaining 0.1% where linear interpolation fails near NaNs.

Tried:
fc = RegularGridInterpolator(points, values, method='cubic', bounds_error=True)
interped_cubic = fc(xi)

which raises "ValueError: Array must not contain infs or nans."

I have also tried interp2d:

fc = interp2d(lon0, lat0, data, kind='cubic', bounds_error=True)
interped_cub = fc(lon, lat)

which returns interpolated values on a grid of shape (lon.size, lat.size), instead of interpolating onto the coordinates (lon, lat). (see Reproducible example for definition of lon/lat and lon0/lat0)

I have also tried removing the NaNs before interpolating:

from scipy.interpolate import interpn
import numpy.ma as ma
mask = np.isnan(data_nan)
lonG,latG = np.meshgrid(lon0,lat0)
lonM = ma.MaskedArray(lonG, mask).compressed()
latM = ma.MaskedArray(latG, mask).compressed()
interped_cub = interpn( (latM, lonM), data_nan[~mask], (lat, lon), method='cubic' )

which reduces the grid to a 1-D array with no NaNs. The interpolation returns ValueError: There are 2 point arrays, but values has 1 dimensions.

Expected:

I expected f=RegularGridInterpolator(points, values, method='cubic');f(xi) to return interpolated values on coordinates in non-NaN-areas, and NaN-values on coordinates near or on NaN-areas - The same as what the function returns when using method='linear', only with a different interpolation-method.

How can I complete the interpolation?:

Since I am obviously not using the function correctly, can someone explain where my mistake is, and hopefully suggest what I can try, to complete the cubic interpolation onto the coordinates?

Reproducible example:
import numpy as np, matplotlib.pyplot as plt
from scipy.interpolate import RegularGridInterpolator, interp2d

### THE FOLLOWING CODEBLOCK RUNS WITHOUT ERROR:
lat0, lon0 = np.linspace(-60, -40, 20), np.linspace(-50, 0.0, 50)
lonG, latG = np.meshgrid(lon0, lat0)
data = np.sin(latG*lonG) # Data has no NaN- areas
lat = np.random.uniform(low=-60, high=-40, size=2000)
lon = np.random.uniform(low=-50, high=0.0, size=2000)
plt.title('2D data, 1x1 grid');plt.pcolormesh(lon0, lat0, data);plt.show()
fl = RegularGridInterpolator((lat0, lon0), data, method='linear', bounds_error=True)
interped_lin = fl((lat, lon))
plt.title('linear interpolation');plt.scatter(lon, lat,c=interped_lin);plt.show()
fc = RegularGridInterpolator((lat0, lon0), data, method='cubic', bounds_error=True)
interped_cub = fc((lat, lon))
plt.title('cubic interpolation');plt.scatter(lon, lat,c=interped_cub);plt.show()


### THE FOLLOWING CODE RAISES ERROR
# Substitute some of the data with NaNs:
data_nan = data.copy()
data_nan[1:8,  5:12] = np.nan
data_nan[2:6, 12:20] = np.nan
plt.title('2D data w/ NaNs, 1x1 grid');plt.pcolormesh(lon0, lat0, data_nan);plt.show()
fl = RegularGridInterpolator((lat0, lon0), data_nan, method='linear', bounds_error=True)
interped_lin = fl((lat, lon))
plt.title('linear interpolation');plt.scatter(lon, lat,c=interped_lin);plt.show()
fc = RegularGridInterpolator((lat0, lon0), data_nan, method='cubic', bounds_error=True)
interped_cub = fc((lat, lon))
plt.title('cubic interpolation');plt.scatter(lon, lat,c=interped_cub);plt.show()


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 27
     25 plt.title('linear interpolation');plt.scatter(lon, lat,c=interped_lin);plt.show()
     26 fc = RegularGridInterpolator((lat0, lon0), data_nan, method='cubic', bounds_error=True)
---> 27 interped_cub = fc((lat, lon))
     28 plt.title('cubic interpolation');plt.scatter(lon, lat,c=interped_cub);plt.show()

File /srv/conda/envs/notebook/lib/python3.10/site-packages/scipy/interpolate/_rgi.py:335, in RegularGridInterpolator.__call__(self, xi, method)
    333     if is_method_changed:
    334         self._validate_grid_dimensions(self.grid, method)
--> 335     result = self._evaluate_spline(self.values.T, xi,
    336                                    self._SPLINE_DEGREE_MAP[method])
    338 if not self.bounds_error and self.fill_value is not None:
    339     result[out_of_bounds] = self.fill_value

File /srv/conda/envs/notebook/lib/python3.10/site-packages/scipy/interpolate/_rgi.py:388, in RegularGridInterpolator._evaluate_spline(self, values, xi, spline_degree)
    381 # Non-stationary procedure: difficult to vectorize this part entirely
    382 # into numpy-level operations. Unfortunately this requires explicit
    383 # looping over each point in xi.
    384 
    385 # can at least vectorize the first pass across all points in the
    386 # last variable of xi.
    387 last_dim = n - 1
--> 388 first_values = self._do_spline_fit(self.grid[last_dim],
    389                                    values,
    390                                    xi[:, last_dim],
    391                                    spline_degree)
    393 # the rest of the dimensions have to be on a per point-in-xi basis
    394 result = np.empty(m, dtype=self.values.dtype)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/scipy/interpolate/_rgi.py:414, in RegularGridInterpolator._do_spline_fit(x, y, pt, k)
    412 @staticmethod
    413 def _do_spline_fit(x, y, pt, k):
--> 414     local_interp = make_interp_spline(x, y, k=k, axis=0)
    415     values = local_interp(pt)
    416     return values

File /srv/conda/envs/notebook/lib/python3.10/site-packages/scipy/interpolate/_bsplines.py:1240, in make_interp_spline(x, y, k, t, bc_type, axis, check_finite)
   1237 axis = normalize_axis_index(axis, y.ndim)
   1239 x = _as_float_array(x, check_finite)
-> 1240 y = _as_float_array(y, check_finite)
   1242 y = np.moveaxis(y, axis, 0)    # now internally interp axis is zero
   1244 # sanity check the input

File /srv/conda/envs/notebook/lib/python3.10/site-packages/scipy/interpolate/_bsplines.py:36, in _as_float_array(x, check_finite)
     34 x = x.astype(dtyp, copy=False)
     35 if check_finite and not np.isfinite(x).all():
---> 36     raise ValueError("Array must not contain infs or nans.")
     37 return x

ValueError: Array must not contain infs or nans.

Thanks! And let me know if I can improve my example.


Solution

  • Scipy's RegularGridInterpolator expects the user to provide data for every point in a regular grid, so leaving out the NaN points will result in an error. To interpolate over the NaNs you can use an unstructured interpolator, such as scipy.interpolate.griddata, which allows the user to pass only some of the points (i.e. you can remove the NaNs). See some example code below.

    import numpy as np
    from scipy.interpolate import griddata
    
    nx = 100
    ny = 100
    x = np.linspace(0, 10, nx)
    y = np.linspace(0, 10, ny)
    X, Y = np.meshgrid(x, y)
    
    rng = np.random.default_rng(42)
    n_nan = 1000
    non_nan_indices = rng.choice(np.arange(nx*ny), nx*ny - n_nan)
    
    X_nan = X.flatten()[non_nan_indices]
    Y_nan = Y.flatten()[non_nan_indices]
    Z_nan = np.cos(X_nan)*np.sin(Y_nan)
    
    Z_interp = griddata((X_nan, Y_nan), Z_nan, (X.flatten(), Y.flatten()), method="cubic")
    

    Z_interp can be reshaped to (nx, ny).