I have a numpy array that looks like this:
array([[0.5, 0.2, 0.6],
[0.8, 0.1, 0.3],
[0.4, 0.5, 0.4],
[0.3, 0.2, 0.9]])
I want to change 50% of the values in the 3rd column into a random value. How can I do this efficiently? This operation will have to be performed hundreds of thousands of times, so efficiency is very important here. The output can look like this:
array([[0.5, 0.2, 0.2],
[0.8, 0.1, 0.3],
[0.4, 0.5, 0.4],
[0.3, 0.2, 0.1]])
I first thought about isolating this column, then replacing some of the values, and then moving this column back into the original matrix.
last_column = array[:,2]
last_column = change_values_randomly(last_column)
np.c_[array[:,:2], last_column]
How do I change 50% of these values randomly?
Since the height of the array is known, we can randomly create half of the height numbers and replace values in the original array using advanced indexing.
arr = np.array([[0.5, 0.2, 0.6],
[0.8, 0.1, 0.3],
[0.4, 0.5, 0.4],
[0.3, 0.2, 0.9]])
# randomly select rows
length = arr.shape[0]
half = height // 2
rows = np.random.choice(length, size=half, replace=False)
# replace values in the last column with random values
arr[rows, 2] = np.random.rand(half)
arr
array([[0.5 , 0.2 , 0.81496687],
[0.8 , 0.1 , 0.3 ],
[0.4 , 0.5 , 0.18514918],
[0.3 , 0.2 , 0.9 ]])
Using the generator api is much faster than np.random.choice
rows = np.random.default_rng().choice(length, size=half, replace=False)
arr[rows, 2] = np.random.rand(half)
Benchmark:
arr = np.linspace(0,1,300000).reshape(-1,3)
length = len(arr)
half = length//2
%timeit np.random.default_rng().choice(length, size=half, replace=False)
# 2.31 ms ± 808 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.random.permutation(length)[:half]
# 4.14 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.random.choice(length, size=half, replace=False)
# 4.14 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.random.default_rng().permutation(length)[:half]
# 3.69 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)