I have a numpy array containing coordinates of points (in 3D, but I am have started off by trying the method in 1D and 2D first) that I would like to fit in a discrete grid. However, I do not want to just move the points to the grid pixel which they are closest to, but rather put on each pixel a weighted value which depends on how far from that pixel the point actually is. For example, if in 1D I have a point x = 15.4, in my discrete space this point would contribute 60% to pixel 15 and 40% to pixel 16. I have managed to do this in 1D and the code is as follows:
# Make test data
N = 1000
data = np.random.normal(
loc = 10,
scale = 2,
size = N
)
# set data range, binsize, and coordinates
xmin = 0
xmax = 20
xbinsize = 0.5
# define the bin centres
xbincenters = np.arange(
xmin+xbinsize*0.5,
xmax,
xbinsize
)
# transform data and floor it to get the closest pixel in the discrete grid
data_t = data/xbinsize - 0.5
bin_indices = np.floor(data_t).astype(int)
# calculate the difference between continuous coordinate and closest pixel
dx = data_t - bin_indices
# get data range in bin indices
index_min = np.ceil(xmin/xbinsize - 0.5).astype(int)
index_max = np.ceil(xmax/xbinsize - 0.5).astype(int)
# do the interpolation
minlength = index_max - index_min
output_array = np.bincount(
bin_indices,
weights = (1-dx),
minlength = minlength
) +\
np.bincount(
bin_indices+1,
weights = dx,
minlength = minlength
)
# finally plot a histogram of the continuous values against output_array
fig,ax = plt.subplots()
ax.hist(data, bins=np.arange(0,20,0.5))
ax.plot(
xbincenters,
output_array,
color = 'r'
)
plt.show()
I have been trying to generalise this to 2D but to no avail, since np.bincount()
only works on 1D arrays.
Up until trying to find the nearest pixels, I feel like the code should be similar, as all the operations above can be broadcasted to a 2D array. In this case, we need to distribute any coordinate to the nearest 4 pixels, depending on a difference dx
and now another difference dy
(which, if you used the above code for an array of shape (1000, 2)
would just be the 1st and second column of the array dx
respectively). I tried unraveling my bin_indices
and dx
arrays to use np.bincount()
on them, but at this point I am not sure if that is correct or what operations to do on the unraveled arrays. Would generalising this problem in 2D (and later in 3D) need a different kind of approach? Any help would be appreciated, thank you in advance!
So, if I get it correctly, you did "by hand" all the interpolation job (there are probably some code to do that somewhere, but can't think of any right now), and use bincount
just to increase the output array (because output_array[indices] += weight
wouldn't have worked, indeed, if indices contain repetitions)
Then, you could just modify indices to target yourself a 2D arrays. Using a width and a height, and using indicesy*width+indicesx
as indices. And then "unravel" the array once finished.
Like this
import numpy as np
import matplotlib.pyplot as plt
N = 1000
# Big array for performance test
datax = np.random.normal(
loc = 10,
scale = 2,
size = N
)
datay = np.random.normal(
loc = 10,
scale = 2,
size = N
)
# Or small one for result test (comment if not wanted)
datax = np.array([15.2, 1.5, 15.3])
datay = np.array([5.1, 10.5, 15.9])
# set data range, binsize, and coordinates
xmin = 0
xmax = 20
xbinsize = 0.5
ymin = 0
ymax = 20
ybinsize = 0.5
# define the bin centres
xbincenters = np.arange(
xmin+xbinsize*0.5,
xmax,
xbinsize
)
ybincenters = np.arange(
ymin+ybinsize*0.5,
ymax,
ybinsize
)
# transform data and floor it to get the closest pixel in the discrete grid
datax_t = datax/xbinsize - 0.5
datay_t = datay/ybinsize - 0.5
bin_indicesx = np.floor(datax_t).astype(int)
bin_indicesy = np.floor(datay_t).astype(int)
# calculate the difference between continuous coordinate and closest pixel
dx = datax_t - bin_indicesx
dy = datay_t - bin_indicesy
# get data range in bin indices
index_minx = np.ceil(xmin/xbinsize - 0.5).astype(int)
index_miny = np.ceil(ymin/ybinsize - 0.5).astype(int)
index_maxx = np.ceil(xmax/xbinsize - 0.5).astype(int)
index_maxy = np.ceil(ymax/ybinsize - 0.5).astype(int)
# do the interpolation
minlengthx = index_maxx - index_minx
minlengthy = index_maxy - index_miny
minlength=minlengthx*minlengthy
flat_output_array = np.bincount(
bin_indicesy*minlengthx+bin_indicesx,
weights = (1-dx)*(1-dy),
minlength = minlength
) +\
np.bincount(
bin_indicesy*minlengthx+bin_indicesx+1,
weights = dx*(1-dy),
minlength = minlength
) +\
np.bincount(
(bin_indicesy+1)*minlengthx+bin_indicesx,
weights = (1-dx)*dy,
minlength = minlength
) +\
np.bincount(
(bin_indicesy+1)*minlengthx+bin_indicesx+1,
weights = dx*dy,
minlength = minlength
)
output_array=flat_output_array.reshape(-1, minlengthx)
# finally plot a histogram of the continuous values against output_array
plt.imshow(output_array)
plt.show()