This is an (x,y) plot I have of a vehicle's position data every 0.1 seconds. The total set is around 500 points.
I read other solutions about interpolating with SciPy (here and here), but it seems that SciPy interpolates at even intervals by default. Below is my current code:
def reduce_dataset(x_list, y_list, num_interpolation_points):
points = np.array([x_list, y_list]).T
distance = np.cumsum( np.sqrt(np.sum( np.diff(points, axis=0)**2, axis=1 )) )
distance = np.insert(distance, 0, 0)/distance[-1]
interpolator = interp1d(distance, points, kind='quadratic', axis=0)
results = interpolator(np.linspace(0, 1, num_interpolation_points)).T.tolist()
new_xs = results[0]
new_ys = results[1]
return new_xs, new_ys
xs, ys = reduce_dataset(xs,ys, 50)
colors = cm.rainbow(np.linspace(0, 1, len(ys)))
i = 0
for y, c in zip(ys, colors):
plt.scatter(xs[i], y, color=c)
i += 1
It produces this output:
This is decent, but I want to set the interpolator to try and place more points in the places that are hardest to linearly interpolate, and place less points in areas that can be easily reconstructed with an interpolated line.
Notice how in the second image, the final point appears to suddenly "jump" from the previous one. And the middle section seems a bit redundant, since many of those points fall in a perfectly straight line. This is not the most efficient use of 50 points for something that is to be reconstructed as accurately as possible using linear interpolation.
I made this manually, but I am looking for something like this, where the algorithm is smart enough to place points very densely in places where the data changes non-linearly:
This way, the data can be interpolated with a higher degree of accuracy. The large gaps between points in this graph can be very accurately interpolated with a simple line, whereas the dense clusters require much more frequent sampling. I have read into the interpolator docs on SciPy, but can't seem to find any generator or setting that can do this.
I have tried using "slinear" and "cubic" interpolation as well, but it seems to still sample at even intervals rather than grouping points where they are needed most.
Is this something SciPy can do, or should I use something like an SKLearn ML algorithm for a job like this?
It seems to me that you are confused between the interpolator object that is constructed by interp1d
, and the actual interpolated coordinates that are the final result you want.
it seems that SciPy interpolates at even intervals by default
interp1d
returns an interpolator object that is built from the x
and y
coordinates you provide. Those do not have to be evenly spaced at all.
Then, you provide to this interpolator xnew
values that define where the interpolator will reconstruct your signal. This is where you have to specify if you want evenly spaced or not: results = interpolator(np.linspace(0, 1, num_interpolation_points)).T.tolist()
. Notice the call to np.linspace
, which literally means "linearly spaced values".
Replace this by np.logspace()
to have logarithmically spaced value, or by something else:
import numpy as np
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
# Generate fake data
x = np.linspace(1, 3, 1000)
y = (x - 2)**3
# interpolation
interpolator = interp1d(x, y)
# different xnews
N = 20
xnew_linspace = np.linspace(x.min(), x.max(), N) # linearly spaced
xnew_logspace = np.logspace(np.log10(x.min()), np.log10(x.max()), N) # log spaced
# spacing based on curvature
gradient = np.gradient(y, x)
second_gradient = np.gradient(gradient, x)
curvature = np.abs(second_gradient) / (1 + gradient**2)**(3 / 2)
idx = np.round(np.linspace(0, len(curvature) - 1, N)).astype(int)
epsilon = 1e-1
a = (0.99 * x.max() - x.min()) / np.sum(1 / (curvature[idx] + epsilon))
xnew_curvature = np.insert(x.min() + np.cumsum(a / (curvature[idx] + epsilon)), 0, x.min())
fig, axarr = plt.subplots(2, 2, layout='constrained', sharex=True, sharey=True)
axarr[0, 0].plot(x, y)
for ax, xnew in zip(axarr.flatten()[1:], [xnew_linspace, xnew_logspace, xnew_curvature]):
ax.plot(xnew, interpolator(xnew), '.--')
axarr[0, 0].set_title('base signal')
axarr[0, 1].set_title('linearly spaced')
axarr[1, 0].set_title('log spaced')
axarr[1, 1].set_title('curvature based spaced')
plt.savefig('test_interp1d.png', dpi=400)
Note that I am not sure that scaling on the curvature as I did is the proper way to do it. But that gives you the idea about interp1d
.