Suppose we have a numpy array v
v=np.array([3, 5])
Now we use the below code to find a new vector say w
v1=np.array(range(v[0]+1))
v2=np.array(range(v[1]+1))
w=np.array(list(itertools.product(v1,v2)))
So w looks like this,
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[1, 4],
[1, 5],
[2, 0],
[2, 1],
[2, 2],
[2, 3],
[2, 4],
[2, 5],
[3, 0],
[3, 1],
[3, 2],
[3, 3],
[3, 4],
[3, 5]])
Now, we need to find the probability vector corresponding to each pair in w knowing that the first element in each pair follows a Binomial distribution Bin(v[0], 0.1) and the second element of each pair follows a Binomial distribution Bin(v[1], 0.05). One way to do it is by this one liner
import scipy.stats as ss
prob_vector=np.array(list((ss.binom.pmf(i[0],v[0], 0.1) * ss.binom.pmf(i[1],v[1], 0.05)) for i in w))
output:
array([5.64086303e-01, 1.48443764e-01, 1.56256594e-02, 8.22403125e-04,
2.16421875e-05, 2.27812500e-07, 1.88028768e-01, 4.94812547e-02,
5.20855312e-03, 2.74134375e-04, 7.21406250e-06, 7.59375000e-08,
2.08920853e-02, 5.49791719e-03, 5.78728125e-04, 3.04593750e-05,
8.01562500e-07, 8.43750000e-09, 7.73780938e-04, 2.03626563e-04,
2.14343750e-05, 1.12812500e-06, 2.96875000e-08, 3.12500000e-10])
But it takes tooo much time to compute, especially since I am iterating over several v vectors!!
Is there an efficient way to compute prob_vector?
Thanks
You're redoing a lot of pmf calls, as well as doing a lot on the Python side instead of the numpy side. We can save those computations by computing on your v1 and v2 arrays, and then multiplying those instead.
import numpy as np
import scipy.stats as ss
import itertools
def orig(x, y):
v = np.array([x, y])
v1 =np.array(range(v[0]+1))
v2=np.array(range(v[1]+1))
w=np.array(list(itertools.product(v1,v2)))
prob_vector=np.array(list((ss.binom.pmf(i[0],v[0], 0.1) * ss.binom.pmf(i[1],v[1], 0.05)) for i in w))
return prob_vector
def faster(x, y):
b0 = ss.binom.pmf(np.arange(x+1), x, 0.1)
b1 = ss.binom.pmf(np.arange(y+1), y, 0.05)
prob_array = b0[:, None] * b1
prob_vector = prob_array.ravel()
return prob_vector
which gives me:
In [61]: %timeit orig(3, 5)
4.46 ms ± 82.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [62]: %timeit faster(3, 5)
192 µs ± 4.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [63]: %timeit orig(30, 50)
311 ms ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [64]: %timeit faster(30, 50)
209 µs ± 8.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [65]: (orig(30, 50) == faster(30, 50)).all()
Out[65]: True