python-3.xnumpyperformancenumpy-indexing

Combine counts from multiple numpy.uniques


I have multiple returns of numpy.unique(a, return_counts=True) and unfortunately do not have access to the original arrays. I want to combine these results to one array with the unique values and to one storing the respective counts. I do not want to create the arrays reversely by using np.repeat() , since these data is too big for my RAM.

I also found Python's collection.Counter but since I'm using the results as numpy-arrays, I would prefer to stay "within" numpy. (Except, you would advise me to do it?)

Is there a efficient way to solve this problem?

I want something like this, without using np.repeat():

mmulti_unique_values = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]])
multi_unique_counts = np.array([[2,2,2,2],[1,2,3,1],[1,1,2,3],[1,2,2,1]])

values_ravel = multi_unique_values.ravel()
counts_ravel = multi_unique_counts.ravel()

np.unique(np.repeat(values_ravel,counts_ravel), return_counts=True)

> (array([1, 2, 3, 4]), array([5, 7, 9, 7]))

I can achieve my desired result using a for-loop, but I'm looking for a (much) faster way!

all_unique_values, indices_ = np.unique(values_ravel, return_inverse=True)

all_unique_counts = np.zeros(all_unique_values.shape)

for count_index, unique_index in enumerate(indices_):
    all_unique_counts[unique_index] += counts_ravel[count_index]
    
(all_unique_values, all_unique_counts)
> (array([1, 2, 3, 4]), array([5., 7., 9., 7.]))

Solution

  • You can simply apply np.unique to get the array with all the unique values and get at the same time the location for each item in the sorted array. Then you can accumulate the number of items based on the previous index so to get the merged number of item.

    all_unique_values, index = np.unique(multi_unique_values, return_inverse=True)
    all_unique_counts= np.zeros(all_unique_values.size, np.int64)
    np.add.at(all_unique_counts, index, multi_unique_counts.ravel())  # inplace
    all_unique_counts