I have a sparsely populated array like the following. Is there an algorithm that can fill in all the blanks with values that make sense linearly? ie. deduced from the surrounding original values.
I've looked at bilinear interpolation and bicubic interpolation, but are there any others?
| 1 | 2 | 3 | 4 | 5 | 6 | 7
---------------------------------------------------------------------------------
1 |
2 |
3 | 55
4 | 50 12 6
5 | 45 19
6 | xxx
7 | 35 45 50 yyy
8 |
9 |
10 |
11 |
12 | zzz
13 |
14 |
15 |
For example, I would expect xxx to be in the vicinity of 40, and yyy to be in the vicinity of 50. zzz however might have a more random value. Note though: I'd like to populate every single blank space, not just xxx, yyy and zzz. And to be able to do so for any sparsely populated array.
Does such an algorithm exist?
A million such algorithms exist. So first of all you have some dictionary of known values like this:
known_values = {
(2, 3): 55.0,
(2, 4): 50.0,
(2, 5): 45.0,
(2, 7): 35.0,
(3, 7): 45.0,
(4, 7): 50.0,
(6, 4): 12.0,
(7, 4): 6.0,
(7, 5): 19.0,
}
The simplest approach is to say that the value at any point is a weighted average of all of the populated points. Weight it by 1/distance squared. So in your above case, you'd have code like this:
def interpolate(known_values, p):
total_weight = 0.0
total_sum = 0.0
for q, value in known_values:
if p == q:
return value
d_square = (p[0] - q[0])**2 + (p[1] - q[1])**2
total_weight = total_weight + 1.0 / d_square
total_sum = total_sum + value / d_square
return total_sum/total_weight
This solution will work as long as the matrix has ANY filled in data.
However judging from how you asked the question, you might want a smooth interpolation which is approximately linear in any small region. One way to do that is look for (a, b, c)
such that the function a*x + b*y + c
minimizes the weighted sum of the squares of the errors, with the weight being the 4th power of the distance from your desired point to the known point. (The first 2 powers undo the square of the area, the other two weight nearby points more.)
The reason to use least squares for the error here is that the math works out simply. You will minimize exactly when a small change in a
, b
, or c
does not change the value much, meaning that the partial derivative is 0. The three partial derivatives therefore give you 3 sets of linear equations. Solving 3 equations in 3 variables is reasonably easy.
However the derivation is long and messy. If you want to try it, you should look at the usual least squares derivation, and try to work through the details. Then try to implement it. But only try that if you really want to try to be trying to do a linear projection to points away from where you have data.