[SOLVED] Randomly flip exactly n elements in multiple 2d matrices

I need to flip exactly randomly n (n>2) chosen elements (uniformly distributed) in total in m 2d numpy matrices with varying sizes (see initialization of my code), which I have stored in a dictionary, every time in my simulation (approx. 10^(7-8) times in total).

Edit: Example: Let's say I have m=3 arrays (different sizes). What I desire is to flip n numbers inside of those arrays in total randomly. E.g. n=10: 6 elements flipped in the first, 3 in the second and 1 in the last array (all elements of any array must have the same probability to get flipped). Not n per array, not less, nor more per run. Over the whole simulation the number n must be uniformly distributed.

The following code works in itself and the loop over N here is for testing the speed. I would be delighted if someone could show me a way to do this faster. <3

Further notes: I also tried to convert the indices of my temporary 1d matrix first into 2d ones (numpy.unravel_index) and then just change my entries, but that's even slower in high dimensions.

# Initialization
import numpy as np
import time
rng = np.random.default_rng(seed=1)
sizes = [700, 27, 48, 20]# code must work for any numbers, usually in the range of~1-1000
num_layers = len(sizes)
params = np.sum([sizes[i]*sizes[i+1] for i in range(num_layers-1)])
w = dict((key, rng.choice([-1, 1], size=(y, x))) for key, (x, y) in enumerate(zip(sizes[:-1], sizes[1:])))# e.g. 27x700, 48x27, 20x48


# The problematic part...
N = 1001
start = time.time()
for i in range(N):# just for testing the speed a little
    # 1) Generate n random numbers idx (1d) (replace is a must, since we need exactly n), then store the old matrices away for later usage.
    n = rng.integers(2, high=params+1)
    idx = rng.choice(np.arange(params), size=(n,), replace=False)
    w_old = dict((key, np.copy(w)) for key, w in sorted(w.items()))
    # 2) Initialize matrices with ones.
    w_flip = dict((key, np.ones((sizes[i]*sizes[i+1]), dtype=np.int32)) for key, i in enumerate(range(num_layers-1)))
    # 3) Flip the spins of this temporary matrix and reshape it afterwards into a 2d one
    left = 0
    for i in range(num_layers-1):
        right = left + sizes[i]*sizes[i+1] 
        w_flip[i][idx[np.where(np.logical_and(idx>=left, idx<right))]-left] *= -1
        w_flip[i] = np.reshape(w_flip[i], (sizes[i+1], sizes[i]))
        left = right
    # 4) Flip the n entries of the m matrices
    for (_,w_i), (__,w_flip_i) in zip(w.items(), w_flip.items()):
        w_i *= w_flip_i
    # ... here I do other stuff with the changed matrices, but that's irrelevant to the topic at hand.
end = time.time()
print(end-start)


# Test if algorithm works (just for me)
for (_,w_i), (__,w_flip_i) in zip(w.items(), w_flip.items()):
    w_i *= w_flip_i
diff = 0
for i in range(len(w)):
    diff = np.sum(w[i] != w_old[i])
if diff != 0:
    print("Something wrong with the algorithm, diff = {0} != 0.".format(diff))

Update: I could speed up the algorithm by around 20% so far for high dimensions, but that's it. With inspiration from James to flatten my array instead I came up with the following code.

I will leave it at that, but if someone knows any points to improve (regarding speed), I would love to hear them. <3

rng = np.random.default_rng(seed=1)
sizes = [700, 27, 48, 10]# code must work for any numbers, usually in the range of~1-1000
num_layers = len(sizes)
params_w = [sizes[i]*sizes[i+1] for i in range(num_layers-1)]
params = np.sum(params_w)
params_arr = np.arange(params)
params_split = np.cumsum(np.array([params_w[i] for i in range(num_layers-2)]))
w = dict((key, rng.choice([-1, 1], size=(y, x))) for key, (x, y) in enumerate(zip(sizes[:-1], sizes[1:])))# e.g. 27x700, 48x27, 20x48

# The problematic part...
N = 1001
start = time.time()
for j in range(N):# just for testing the speed a little
    # 1) Generate n random numbers idx (1d) (replace is a must, since we need exactly n), then store the old matrices away for later usage.
    n = rng.integers(2, high=params+1)
    idx = rng.choice(params_arr, size=(n,), replace=False)
    w_old = dict((key, np.copy(w)) for key, w in sorted(w.items()))
    # 2) Initialize flip matrix (1d) and new dictionary
    w_flip = np.ones(params, dtype=np.int32)
    w_flip[idx] = -1    
    w_new = dict((key, None) for key, i in enumerate(range(num_layers-1)))
    # 3) Flatten the old matrices from 2d to 1d and concatenate them together
    for i, arr in enumerate(w.values()):
        w[i] = np.reshape(arr, (params_w[i]))
    w_new = np.concatenate(list(w.values()))
    # 4) Flip the n entries of the m matrices 
    w_new *= w_flip
    # 5) Split them apart again and store in dictionary
    temp = np.split(w_new, params_split)
    for key in w.keys(): 
        w[key] = np.reshape(temp[key], (sizes[key+1],sizes[key]))
    # ... here I do other stuff with the changed matrices, but that's irrelevant to the topic at hand.
end = time.time()
print(end-start)

# Just to check if n entries have been flipped (and one can easily verify that they have been uniformly flipped too).
n_done = 0
for i, (arr, arr_old) in enumerate(zip(w.values(), w_old.values())):
    temp = np.reshape(arr, (sizes[i+1]*sizes[i]))
    temp_old = np.reshape(arr_old, (sizes[i+1]*sizes[i]))
    n_done += np.sum(temp != temp_old)
print("Performed flips minus amount of flips we wanted:", n_done - n)