Efficient repeated interpolation on large number of points (~150k) with RBF with smaller number of control points (~200)

I am trying to solve a least squares problem where my independent variables are a handful of values at each of a few hundred control points, which are then interpolated to a huge number of other points via a RBFInterpolator (with neighbours <= 10), and then those interpolated values are used to compute the residuals - each point has some target values and the interpolated variables are used to generate predictions, from which I get residuals. These points are just coordinates in space, they are not on a regular grid or anything like that. The control points are a subset of the points.

The residual function used for the least squares optimizer is generated by a factory function and looks something like this (simplified):

def create_global_residual_function(
    model: Callable[[np.ndarray, float, ...], np.ndarray],
    control_points: np.ndarray,
    points: np.ndarray,
    y: np.ndarray,
    x: np.ndarray,
    thin_plate_spline_smoothing: float,
    thin_plate_spline_neighbors: int,
    thin_plate_spline_degree: int,
    residual_boost_factor: float = 2.0,
) -> Callable[[Tuple[float, ...]], np.ndarray]:
    
    num_control_points = control_points.shape[0]
    num_points = points.shape[0]

    x = x.reshape(1, len(x))

    tps = RBFInterpolator(
        control_points,
        np.zeros((num_control_points, 5), dtype=float),
        smoothing = thin_plate_spline_smoothing,
        neighbors = thin_plate_spline_neighbors,
        kernel = "thin_plate_spline",
        degree = thin_plate_spline_degree
    )

    def global_residual_function(params: Tuple[float, ...]) -> np.ndarray:
        tps.d = np.array(params).reshape(num_control_points, <number of parameters>)
        model_parameters = tps(points)
        x0 = model_parameters[:, 0].reshape(num_points, 1)
        x1 = model_parameters[:, 1].reshape(num_points, 1)
        <and so on ...>
        y_hat = model(x, x0, x1, <and so on...>)
        return (y_hat - y).flatten()

    return global_residual_function

In each iteration, what I am changing is only the d attribute of the RBFInterpolator. The points I interpolate onto are the same every time. The interpolation has to happen on the order of 100s to 1000s of times before the model will converge. It seems silly to me to invoke RBFInterpolator.call repeatedly when the the input points are unchanging. I'm sure there is a lot of computation being unnecessarily repeated to find the nearest X control points to each points, calculate the distances and feed into kernels, etc. In theory there should be a single (sparse) matrix that could be computed and then repeatedly used to compute the values of a scalar array on the points from the value of the scalar array on the control points.

My solution works - I've tested it on small test cases (~1000s of points with ~20 control points) and gotten correct results from the optimization. The problem is it is very inefficient and doesn't scale well to larger problems.

Is there a more efficient solution for this in scipy? Perhaps some way to extract the (sparse) matrix I referred to above? Or a different interpolator class I should be using for this application?

Solution

Yes, given these constraints, it is possible to rewrite RBFInterpolator to get about a 50% speedup.

Before you write any code, it's a good idea to check if you're optimizing the right thing. The first step I tried was to use line_profiler to check how much time __call__() spends doing things that can be avoided through caching the nearest points. Although it uses a KD-Tree to speed up searches of the original input space, it still spends approximately 50% of its time searching for nearby points, sorting those points, and de-duplicating points which have the same neighbors.

I began by sub-classing RBFInterpolator. We'll need to override two methods: __init__() and __call__(). This allows us to re-use most of RBFInterpolator. I started by copying the implementation from SciPy.

Next, I changed the way that X, Y, and D are provided. (Note: I am using the terms X, Y, and D in the same way that the SciPy documentation refers to them.) Initially, the code assumes that you provide Y and D at the time that you create the RBFInterpolator, and you get back an RBFInterpolator that can be evaluated at any X. I changed this so that it used a fixed X and Y, and allowed you to change D.

Next, I looked for computations I could avoid if X and Y were fixed. I found that I could avoid recomputing xindices, yindices, and inv each time.

import numpy as np
from scipy.interpolate import RBFInterpolator, _rbfinterp
import timeit


class RBFInterpolatorWithDynamicD(RBFInterpolator):
    def __init__(self, y, x, d_shape, *args, **kwargs):
        self.x = np.asarray(x, dtype=float, order="C")
        d = np.zeros(d_shape)
        ny, ndim = y.shape
        assert d.size % ny == 0, "d and y must have same first axis size"
        self.d_size_inner = d.size // ny
        super().__init__(y, d, *args, **kwargs)
        self.precalculate_derived_values()


    def precalculate_derived_values(self):
        # Get the indices of the k nearest observation points to each
        # evaluation point.
        _, yindices = self._tree.query(self.x, self.neighbors)
        if self.neighbors == 1:
            # `KDTree` squeezes the output when neighbors=1.
            yindices = yindices[:, None]

        # Multiple evaluation points may have the same neighborhood of
        # observation points. Make the neighborhoods unique so that we only
        # compute the interpolation coefficients once for each
        # neighborhood.
        yindices = np.sort(yindices, axis=1)
        yindices, inv = np.unique(yindices, return_inverse=True, axis=0)
        inv = np.reshape(inv, (-1,))  # flatten, we need 1-D indices

        # `inv` tells us which neighborhood will be used by each evaluation
        # point. Now we find which evaluation points will be using each
        # neighborhood.
        xindices = [[] for _ in range(len(yindices))]
        for i, j in enumerate(inv):
            xindices[j].append(i)

        self.yindices = yindices
        self.inv = inv
        self.xindices = xindices

    def __call__(self, d):
        """Evaluate the interpolant at `self.x`, with `d` values corresponding
        to original points at `self.y`."""
        ny, ndim = self.y.shape
        assert d.shape == (ny, *self.d_shape), f"Expecting D shape of {(ny, *self.d_shape)}"
        d = d.reshape((ny, -1))
        if self.x.ndim != 2:
            raise ValueError("`x` must be a 2-dimensional array.")

        nx, ndim = self.x.shape
        if ndim != self.y.shape[1]:
            raise ValueError("Expected the second axis of `x` to have length "
                             f"{self.y.shape[1]}.")

        # Our memory budget for storing RBF coefficients is
        # based on how many floats in memory we already occupy
        # If this number is below 1e6 we just use 1e6
        # This memory budget is used to decide how we chunk
        # the inputs
        memory_budget = max(self.x.size + self.y.size + d.size, 1000000)

        if self.neighbors is None:
            raise NotImplemented("self.neighbors is None is not implemented, use RBFInterpolator")
        else:
            yindices = self.yindices
            inv = self.inv
            xindices = self.xindices

            out = np.empty((nx, self.d_size_inner), dtype=float)
            for xidx, yidx in zip(xindices, yindices):
                # `yidx` are the indices of the observations in this
                # neighborhood. `xidx` are the indices of the evaluation points
                # that are using this neighborhood.
                xnbr = self.x[xidx]
                ynbr = self.y[yidx]
                dnbr = d[yidx]
                snbr = self.smoothing[yidx]
                shift, scale, coeffs = _rbfinterp._build_and_solve_system(
                    ynbr,
                    dnbr,
                    snbr,
                    self.kernel,
                    self.epsilon,
                    self.powers,
                )
                out[xidx] = self._chunk_evaluator(
                    xnbr,
                    ynbr,
                    shift,
                    scale,
                    coeffs,
                    memory_budget=memory_budget)

        out = out.astype(d.dtype)
        out = out.reshape((nx, ) + self.d_shape)
        return out


def main():
    # Create example dataset with 200 initial points
    Y = np.random.uniform(-1, 1, size=(200, 2))
    D = np.sum(Y, axis=1)*np.exp(-6*np.sum(Y**2, axis=1))
    # Infer the output on 150000 points
    X = np.random.uniform(-1, 1, size=(150000, 2))
    interp = RBFInterpolator(Y, D, neighbors=10)
    interp2 = RBFInterpolatorWithDynamicD(Y, X, d_shape=D.shape, neighbors=10)
    print("equal?", np.allclose(interp(X), interp2(D)))
    num_runs = 10
    print("classic", timeit.timeit('interp(X)', number=num_runs, globals={'interp': interp, 'X': X}) / num_runs)
    print("optimized", timeit.timeit('interp2(D)', number=num_runs, globals={'interp2': interp2, 'D': D}) / num_runs)


main()

This program tests that it gets the same result, and benchmarks it.

Warning: this program uses private parts of the SciPy API. This means that future versions of SciPy may change parts of the library which this answer relies on. This could be avoided by re-implementing the rest of RBFInterpolator.

Perhaps some way to extract the (sparse) matrix I referred to above?

I don't think you could extract a matrix to do the interpolation. There are two things that would stop you:

The relationship between the coordinates and the output isn't linear. The kernel function is applied to the distance metric, and the kernel isn't linear.
Changing D requires the interpolator to re-fit a linear regression between the distance to each D value and D. This is what _build_and_solve_system() is doing. For that reason, changing any value in D changes the coefficients for other D values.