pythonscipyinterpolationlinear-interpolation

How to set SciPy interpolator to preserve the data most accurately?


This is an (x,y) plot I have of a vehicle's position data every 0.1 seconds. The total set is around 500 points.

original data

I read other solutions about interpolating with SciPy (here and here), but it seems that SciPy interpolates at even intervals by default. Below is my current code:

def reduce_dataset(x_list, y_list, num_interpolation_points):
    points = np.array([x_list, y_list]).T 
    distance = np.cumsum( np.sqrt(np.sum( np.diff(points, axis=0)**2, axis=1 )) )
    distance = np.insert(distance, 0, 0)/distance[-1]
    interpolator =  interp1d(distance, points, kind='quadratic', axis=0)
    results = interpolator(np.linspace(0, 1, num_interpolation_points)).T.tolist()
    new_xs = results[0]
    new_ys = results[1]
    return new_xs, new_ys



xs, ys = reduce_dataset(xs,ys, 50)
colors = cm.rainbow(np.linspace(0, 1, len(ys)))
i = 0
for y, c in zip(ys, colors):
    plt.scatter(xs[i], y, color=c)
    i += 1

It produces this output:

interpolator

This is decent, but I want to set the interpolator to try and place more points in the places that are hardest to linearly interpolate, and place less points in areas that can be easily reconstructed with an interpolated line.

Notice how in the second image, the final point appears to suddenly "jump" from the previous one. And the middle section seems a bit redundant, since many of those points fall in a perfectly straight line. This is not the most efficient use of 50 points for something that is to be reconstructed as accurately as possible using linear interpolation.

I made this manually, but I am looking for something like this, where the algorithm is smart enough to place points very densely in places where the data changes non-linearly:

enter image description here

This way, the data can be interpolated with a higher degree of accuracy. The large gaps between points in this graph can be very accurately interpolated with a simple line, whereas the dense clusters require much more frequent sampling. I have read into the interpolator docs on SciPy, but can't seem to find any generator or setting that can do this.

I have tried using "slinear" and "cubic" interpolation as well, but it seems to still sample at even intervals rather than grouping points where they are needed most.

Is this something SciPy can do, or should I use something like an SKLearn ML algorithm for a job like this?


Solution

  • It seems to me that you are confused between the interpolator object that is constructed by interp1d, and the actual interpolated coordinates that are the final result you want.

    it seems that SciPy interpolates at even intervals by default

    interp1d returns an interpolator object that is built from the x and y coordinates you provide. Those do not have to be evenly spaced at all.

    Then, you provide to this interpolator xnew values that define where the interpolator will reconstruct your signal. This is where you have to specify if you want evenly spaced or not: results = interpolator(np.linspace(0, 1, num_interpolation_points)).T.tolist(). Notice the call to np.linspace, which literally means "linearly spaced values".

    Replace this by np.logspace() to have logarithmically spaced value, or by something else:

    import numpy as np
    from scipy.interpolate import interp1d
    
    import matplotlib.pyplot as plt
    
    # Generate fake data
    x = np.linspace(1, 3, 1000)
    y = (x - 2)**3
    
    # interpolation
    interpolator = interp1d(x, y)
    
    # different xnews
    N = 20
    xnew_linspace = np.linspace(x.min(), x.max(), N)  # linearly spaced
    xnew_logspace = np.logspace(np.log10(x.min()), np.log10(x.max()), N)  # log spaced
    
    # spacing based on curvature
    gradient = np.gradient(y, x)
    second_gradient = np.gradient(gradient, x)
    curvature = np.abs(second_gradient) / (1 + gradient**2)**(3 / 2)
    idx = np.round(np.linspace(0, len(curvature) - 1, N)).astype(int)
    epsilon = 1e-1
    
    a = (0.99 * x.max() - x.min()) / np.sum(1 / (curvature[idx] + epsilon))
    xnew_curvature = np.insert(x.min() + np.cumsum(a / (curvature[idx] + epsilon)), 0, x.min())
    
    fig, axarr = plt.subplots(2, 2, layout='constrained', sharex=True, sharey=True)
    
    axarr[0, 0].plot(x, y)
    for ax, xnew in zip(axarr.flatten()[1:], [xnew_linspace, xnew_logspace, xnew_curvature]):
        ax.plot(xnew, interpolator(xnew), '.--')
    
    axarr[0, 0].set_title('base signal')
    axarr[0, 1].set_title('linearly spaced')
    axarr[1, 0].set_title('log spaced')
    axarr[1, 1].set_title('curvature based spaced')
    
    plt.savefig('test_interp1d.png', dpi=400)
    

    enter image description here

    Note that I am not sure that scaling on the curvature as I did is the proper way to do it. But that gives you the idea about interp1d.