[SOLVED] How can I replace duplicate elements with its first occurrence index in Numpy? (first occurrence indices)

How can I replace duplicate elements with its first occurrence index in Numpy? (first occurrence indices)

for exmaple, I have two arrays: 'x' for actual values, 'I' for their index in array.

x = [1, 2, 3, 3, 2, 4]
I = [0, 1, 2, 3, 4, 5]

in 'x', the 4th value is duplicate one of 3rd value and the 5th value is duplicate one of 2nd value

Therefore, I want to generate the

y = [0, 1, 2, 2, 1, 5]

(containing first occurrence indicies of original array values)

How can I do this efficiently using python numpy methods?

Solution

You could do:

u, idx, inv = np.unique(x, return_inverse=True, return_index=True)
>>> idx
array([0, 1, 2, 5], dtype=int64)
>>> inv
array([0, 1, 2, 2, 1, 3], dtype=int64)
>>> idx[inv]
array([0, 1, 2, 2, 1, 5], dtype=int64)

No, after it's clear, read the docs of np.unique:

inv are the indices of the unique array that can be used to reconstruct x.
idx are the indices of x that result in the unique array.

So you just take the indices of x that result in the unique array and reconstruct them instead of x