I have a NumPy array and need to identify repeated elements, marking the second occurrence and beyond as True
, while keeping the first occurrence as False
.
For example, given the following array:
np.random.seed(100)
a = np.random.randint(0, 5, 10)
# Output: [0 0 3 0 2 4 2 2 2 2]
I want to get the following output:
[False True False True False False True True True True]
How can I achieve this using NumPy functions only, without using any loops or extra libraries?
I was able to get it working with a loop, but I wanted to solve it using only NumPy functions. I tried implementing np.cumsum
with masks, but I couldn’t make much progress.
Here's the solution I came up with using one loop:
np.random.seed(100)
a = np.random.randint(0, 5, 10)
print(a)
uniques, first_indices = np.unique(a, return_index=True)
all_occurrences = np.zeros_like(a, dtype=bool)
for i in range(len(a)):
all_occurrences[i] = np.any(a[:i] == a[i])
all_occurrences[first_indices] = False
print(all_occurrences)
To reveal that the problem is actually less complicated than it may seem at first glance, the question could be rephrased as follows: Mark all first occurrences of values with False
.
This leads to a bit of a simplified version of EuanG's answer¹:
def find_repeated(a):
mask = np.ones_like(a, dtype=bool)
mask[np.unique(a, return_index=True)[-1]] = False
return mask
Steps: (1) Initialize the result mask
as an all-True
array of appropriate shape. (2) Find the indices of the first occurrences in the given array a
. (3) Only set these indices to False
in the result mask
.
To make the code also work with n-dimensional arrays, we need to add an extra step of unraveling the result of np.unique()
, as it returns the indices into the flattened given array a
:
def find_repeated(a):
mask = np.ones_like(a, dtype=bool)
mask[np.unravel_index(np.unique(a, return_index=True)[-1], a.shape)] = False
return mask
In either case:
np.unique(…, return_index=True)[-1]
) for indexing the mask
array.¹) Yes, I find EuanG's answer perfectly acceptable as well. No, I did not downvote it.