pythonarraysnumpyparallel-processingindices

Transition array in parallel (Python 3)


I am working on large arrays, and trying to compute the "transition" array (please tell me if that is the right word for it). With a very simple example:

old_indices = [1, 2, 3, 0]
indices = [2, 3, 0, 1]

Since the element at index 0 (1) goes to 3, the first element of the transition is 3, the same for the element at index 1 (2) which goes to 0, so it would be 0, etc. The transition should then be:

transition = [3, 0, 1, 2]

Then, a "transition map" can be created, which is basically the opposite way. The new element at index 0 (2) comes from index 1 in the old list, so the first element of the transition map would be 1, the element at index 1 (3) comes from index 2 in the older list, etc.

transition_map = [1,2,3,0]

I implemented something like this to create these lists:

transition = np.empty(indices.shape, int)
transition_map = np.empty(indices.shape, int)
for i in range(len(old_indices)):
    for j in range(len(indices)):
        if old_indices[i] == indices[j]:
            transition[i] = j
            transition_map[transition[i]] = i

Which works properly to create these lists. However, if I have hundreds of thousands or millions of elements in each indices list (of the same size), it starts to take more time. I have 48 processors available for this calculation. I was wondering if parallelizing this and populating the array could be beneficial. However, I have no experience in parallelization with Python 3. For instance, I do not know which tool I should use, and if the fact that I populate the same array with multiple processors would make it possible (or even, if the communication time would be too high to synchronize the array every time).


Solution

  • Fast : O(n)

    Assuming that each element exists in both, this solution applies. To speed up the process, we can use a dictionary to keep track of the indices, where the key is the current value, which corresponds to the index. To make the naming easier to understand, I have called the variables before the transition, and after, so that we know how the transition maps it.

    # Setup test cases
    before = [1, 2, 3, 0]
    after = [2, 3, 0, 1]
    
    # Create the dictionary
    for i, v in enumerate(after):
        indices[v] = i
    
    # Find the indices for the transition array
    transitions = [indices[x] for x in before]
    

    To find your so-called transition_map, just reverse the names.

    for i, v in enumerate(before):
        indices[v] = i
    
    transition_map = [indices[x] for x in after]