matlabvectorrepeatpruning

Finding removed elements in a vector in matlab


I have a process that is iteratively and randomly pruning a huge vector of integers and I want to find what elements are removed between each iteration. This vector has a lot of repetitions and using ismember() and setdiff() doesn't helped me much.

As an illustration if X = [1,10,8,5,10,3,5,2]:

step 0: X = 1,10,8,5,10,3,5,2
step 1: X = 1,10,8,10,3,5,2 (5 is removed)
step 2: X = 1,10,8,3,2 (10 and 5 are removed)
step 3: X = 10,8,3,2 (1 is removed)
step 4: X = 2 (10, 8 and 3 are removed)
step 5: X = [] (2 is finally removed)

I aim at finding the elements removed at each steps (ie. 5 then, 10 and 5 and so on). I could possibly find an overly complicated solution using hist(X, unique(X)) between steps, but I assume there exists a much more elegant (and cheaper!) solution in matlab.


Solution

    1. This approach is memory-intensive. It computes an intermediate matrix of size NxM where N is the number of elements of X and M is the number of unique elements of X, using implicit expansion. This may be feasible or not depending on your typical N and M.

      X = [1,10,8,5,10,3,5,2];
      Y = [8,10,2,1]; % removed 10, 5, 5, 3. Order in Y is arbitrary
      u = unique(X(:).');
      removed = repelem(u, sum(X(:)==u,1)-sum(Y(:)==u,1));
      

      gives

      removed =
           3     5     5    10
      

      For Matlab versions before R2016b, you need bsxfun instead of implicit expansion:

      removed = repelem(u, sum(bsxfun(@eq,X(:),u),1)-sum(bsxfun(@eq,Y(:),u),1));
      
    2. If the values in X are always positive integers, a more efficient approach can be used, employing sparse to compute the number of times each element appears:

      X = [1,10,8,5,10,3,5,2];
      Y = [8,10,2,1]; % removed 10, 5, 5, 3. Order in Y is arbitrary
      removed = repelem(1:max(X), sparse(1,X,1) - sparse(1,Y,1));