pythonnumpyrandomreplaceswitch-statement

How to change random index positions in numpy array


I have a numpy array that looks like this:

array([[0.5, 0.2, 0.6],
       [0.8, 0.1, 0.3],
       [0.4, 0.5, 0.4],
       [0.3, 0.2, 0.9]])

I want to change 50% of the values in the 3rd column into a random value. How can I do this efficiently? This operation will have to be performed hundreds of thousands of times, so efficiency is very important here. The output can look like this:

array([[0.5, 0.2, 0.2],
       [0.8, 0.1, 0.3],
       [0.4, 0.5, 0.4],
       [0.3, 0.2, 0.1]])

I first thought about isolating this column, then replacing some of the values, and then moving this column back into the original matrix.

last_column = array[:,2]
last_column = change_values_randomly(last_column)
np.c_[array[:,:2], last_column]

How do I change 50% of these values randomly?


Solution

  • Since the height of the array is known, we can randomly create half of the height numbers and replace values in the original array using advanced indexing.

    arr = np.array([[0.5, 0.2, 0.6],
    [0.8, 0.1, 0.3],
    [0.4, 0.5, 0.4],
    [0.3, 0.2, 0.9]])
    # randomly select rows 
    length = arr.shape[0]
    half = height // 2
    rows = np.random.choice(length, size=half, replace=False)
    # replace values in the last column with random values
    arr[rows, 2] = np.random.rand(half)
    arr
    array([[0.5       , 0.2       , 0.81496687],
           [0.8       , 0.1       , 0.3       ],
           [0.4       , 0.5       , 0.18514918],
           [0.3       , 0.2       , 0.9       ]])
    

    Using the generator api is much faster than np.random.choice

    rows = np.random.default_rng().choice(length, size=half, replace=False)
    arr[rows, 2] = np.random.rand(half)
    

    Benchmark:

    arr = np.linspace(0,1,300000).reshape(-1,3)
    length = len(arr)
    half = length//2
    
    %timeit np.random.default_rng().choice(length, size=half, replace=False)
    # 2.31 ms ± 808 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    %timeit np.random.permutation(length)[:half]
    # 4.14 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    %timeit np.random.choice(length, size=half, replace=False)
    # 4.14 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    %timeit np.random.default_rng().permutation(length)[:half]
    # 3.69 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)