pythonnumpyscipynearest-neighborspatial-interpolation

Interpolate unstructured X,Y,Z data on best grid based on nearest neighbour distance for each points


This question was edited after answers for show final solution I used

I have unstructured 2D datasets coming from different sources, like by example: Example data 1: 3D measurement Example data 2: 2D mesh nodes Theses datasets are 3 numpy.ndarray (X, Y coordinates and Z value).

My final aim is to interpolate theses datas on a grid for conversion to image/matrix. So, I need to find the "best grid" for interpolate theses datas. And, for this I need to find the best X and Y step between pixels of that grid.

Determinate step based on Euclidean distance between points:

Use the mean of Euclidean distances between each point and its nearest neighbour.



    # Generate KD Tree
    xy = np.c_[x, y]  # X,Y data converted for use with KDTree
    tree = scipy.spacial.cKDTree(xy)  # Create KDtree for X,Y coordinates.

    # Calculate step
    distances, points = tree.query(xy, k=2)  # Query distances for X,Y points
    distances = distances[:, 1:]  # Remove k=1 zero distances
    step = numpy.mean(distances)  # Result

Performance tweaking:

Interpolate points on grid:

Interpolate dataset points on grid using the calculated step.



    # Generate grid
    def interval(axe):
        '''Return numpy.linspace Interval for specified axe'''
        cent = axe.min() + axe.ptp() / 2  # Interval center
        nbs = np.ceil(axe.ptp() / step)  # Number of step in interval
        hwid = nbs * step / 2  # Half interval width 
        return np.linspace(cent - hwid, cent + hwid, nbs)  # linspace

    xg, yg = np.meshgrid(interval(x), interval(y))  # Generate grid

    # Interpolate X,Y,Z datas on grid
    zg = scipy.interpolate.griddata((x, y), z, (xg, yg))

Set NaN if pixel too far from initials points:

Set NaN to pixels from grid that are too far (Distance > step) from points from initial X,Y,Z data. The previous generated KDTree is used.



    # Calculate pixel to X,Y,Z data distances
    dist, _ = tree.query(np.c_[xg.ravel(), yg.ravel()])
    dist = dist.reshape(xg.shape)

    # Set NaN value for too far pixels
    zg[dist > step] = np.nan


Solution

  • I suggest you to go with KDTree.query.

    You are searching of a carachteristic distance to scale your binning: I suggest you to take only a random subset of your points, and to use the Manhattan distance, becasue KDTree.query is very slow (and yet it is a n*log(n) complexity).

    Here is my code:

    # CreateTree
    tree=scipy.spatial.KDTree(numpy.array(points)) # better give it a copy?
    # Create random subsample of points
    n_repr=1000
    shuffled_points=numpy.array(points)
    numpy.random.shuffle(shuffled_points)
    shuffled_points=shuffled_points[:n_repr]
    # Query the tree
    (dists,points)=tree.query(shuffled_points,k=2,p=1)
    # Get _extimate_ of average distance:
    avg_dists=numpy.average(dists)
    print('average distance Manhattan with nearest neighbour is:',avg_dists)
    

    I suggest you to use the Manhattan distance ( https://en.wikipedia.org/wiki/Taxicab_geometry ) because it is was faster to compute than the euclidean distance. And since you need only an estimator of the average distance it should be sufficient.