I am generating many, many contingency tables as part of a project that I'm writing.
The workflow is:
Initially, I wrote this as:
def make_table(x, y, num_bins):
ctable = np.zeros((num_bins, num_bins), dtype=np.dtype(int))
for xn, yn in zip(x, y):
ctable[xn, yn] += 1
return ctable
This works fine, but is so slow that it's eating up like 90% of the runtime of the entire project.
The fastest python-only optimization I've been able to come up with is this:
def make_table(x, y, num_bins):
ctable = np.zeros(num_bins ** 2, dtype=np.dtype(int))
reindex = np.dot(np.stack((x, y)).transpose(),
np.array([num_bins, 1]))
idx, count = np.unique(reindex, return_counts=True)
for i, c in zip(idx, count):
ctable[i] = c
return ctable.reshape((num_bins, num_bins))
That's (somehow) a lot faster, but it's still pretty expensive for something that doesn't seem like it should be a bottleneck. Are there any efficient ways to do this that I'm just not seeing, or should I just give up and do this in cython?
Also, here's a benchmarking function.
def timetable(func):
size = 5000
bins = 10
repeat = 1000
start = time.time()
for i in range(repeat):
x = np.random.randint(0, bins, size=size)
y = np.random.randint(0, bins, size=size)
func(x, y, bins)
end = time.time()
print("Func {na}: {ti} Ms".format(na=func.__name__, ti=(end - start)))
The clever trick for representing the elements of np.stack((x, y))
as integers can be made faster:
In [92]: %timeit np.dot(np.stack((x, y)).transpose(), np.array([bins, 1]))
109 µs ± 6.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [94]: %timeit bins*x + y
12.1 µs ± 260 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Moreover, the last part of your second solution can be simplified somewhat, by simply considering
np.unique(bins * x + y, return_counts=True)[1].reshape((bins, bins))
What is more, since we are dealing with equally spaced non-negative integers, np.bincount
will outperform np.unique
; with that, the above boils down to
np.bincount(bins * x + y).reshape((bins, bins))
All in all, this provides quite some performance over what you are currently doing:
In [78]: %timeit make_table(x, y, bins) # Your first solution
3.86 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [79]: %timeit make_table2(x, y, bins) # Your second solution
443 µs ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [101]: %timeit np.unique(bins * x + y, return_counts=True)[1].reshape((bins, bins))
307 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [118]: %timeit np.bincount(bins * x + y).reshape((10, 10))
30.3 µs ± 3.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
You may also want to be aware of np.histogramdd
which takes care of both rounding and binning at once, although it's likely going to be slower than rounding and using np.bincount
.