[SOLVED] Spatial Accuracy Analysis in Python

Spatial Accuracy Analysis in Python

I have a, hopefully, simple spatial analysis I'd like to perform in python. However, I haven't fully figured out how to get python to do what I want it to do.

I have a CSV file with 3 critical columns: Hit or Miss, X Location, Y Location. Each row is an instance of whether a hit or miss occurred, and what the x and y coordinates were for that hit or miss (e.g., "hit, 10, 58").

To give you a better picture, imagine there are hundreds of thousands of these points, and they all fall somewhere within a 100 x 100 grid (where 0,0 = bottom left corner and 100,100 = upper right corner). Hits and misses are distributed across the grid, where in some locations there are hits and misses overlapping, some locations with only hits or only misses, and some locations with neither.

My ultimate goal is to produce a heat-map that visualizes the relative accuracy (hits/(hits+misses)) across the grid.

The best idea I've been able to come up with is to plot the hits and misses on the same scatterplot, reduce the opacity of the points, and let the density of points dictate the hue, which would then represent relative accuracy.... but this looks awful....

My next idea is to make bins. So, the grid would be broken into 50 2x2 bins and I would have my program perform the accuracy analysis (hits/(hits+misses)) for each bin. But, alas, I have no idea how to do this.

Does anyone have any ideas?

Thanks!

Solution

You could use the Kernel Density Estimator from scipy.stats. I think this should do just what you want. It will show where data tends to cluster. The KDE will build an estimate of the 2d probabilty density. You then evaluate that KDE estimator over your 100x100 grid. You can plot the result with a contour plot.

Your code could look something like this:

import numpy as np
from scip.stats import kde
import matplotlib.pyplot as plt

ndiv = 101
xr = np.linspace(0.0, 100.0 ndiv)
yr = np.linspace(0.0, 100.0 ndiv)
x,y = np.meshgrid(xr, yr)

# points here would be your 'hits' or 'miss' subset
estimator = kde.gaussian_kde(points)

# this turns the grid into a list of points that will
# be used by the KDE for evaluation
grid_coords = np.append(x.reshape(-1, 1), y.reshape(-1, 1), axis=1)
z = estimator(grid_coords.T)
z = z.reshape(ndiv, ndiv)

# you can specify contour levels
lvls = np.array([.05, .5, .75, 1.0]) * z.max()  
cfset = plt.contourf(x, y, z, cmap='jet', levels=lvls)