pythonnumpy

How can I replace duplicate elements with its first occurrence index in Numpy? (first occurrence indices)


for exmaple, I have two arrays: 'x' for actual values, 'I' for their index in array.

x = [1, 2, 3, 3, 2, 4]
I = [0, 1, 2, 3, 4, 5]

in 'x', the 4th value is duplicate one of 3rd value and the 5th value is duplicate one of 2nd value

Therefore, I want to generate the

y = [0, 1, 2, 2, 1, 5]

(containing first occurrence indicies of original array values)

How can I do this efficiently using python numpy methods?


Solution

  • You could do:

    u, idx, inv = np.unique(x, return_inverse=True, return_index=True)
    >>> idx
    array([0, 1, 2, 5], dtype=int64)
    >>> inv
    array([0, 1, 2, 2, 1, 3], dtype=int64)
    >>> idx[inv]
    array([0, 1, 2, 2, 1, 5], dtype=int64)
    

    No, after it's clear, read the docs of np.unique:

    So you just take the indices of x that result in the unique array and reconstruct them instead of x