Say I have two matrices:
A = [1;
2;
3;
4;
5]
and
B = [1 1 2 1;
2 3 3 4;
5 5 5 5;
1 4 4 4;
5 5 1 2]
I need a resulting matrix showing how many times an item from A has appeared on the respective row in B, like this:
C = [3;
1;
0;
3;
2]
Currently, I am doing that using ismember inside of a for loop and adding the result per row. But it is taking forever because of the size of my matrices. Is there any way to do that without a loop?
Editing to show my current code:
for i=1:1:length(A)
C(i) = sum(ismember(B(i,:),A(i)),2);
end
But, in reality, my matrices have over 500k lines, and I would love to make this code more efficient by removing the loop.
Short Answer
You can just use ==
, which will use implicit expansion to compare each element of A
to the entire row of B
and output a logical array of matches, then sum across each row to count the matches
C = sum( A == B, 2 );
Depending slightly on how A
and B
are generated, it's often a good idea to avoid direct equality comparisons on floating point numbers (see here), so it might be more robust to make sure A
and B
are within some small tolerance, i.e.
tol = 1e-8;
C = sum( abs(A-B) < tol, 2 );
The subtraction here will use implicit expansion the same as ==
in the first example to propagate A
across the columns of B
.
Benchmarking
Out of interest I've done some benchmarking between four cases
C
so it grows with the loop.C
as an array of NaN
the correct size, this is typically a good way to speed up loops. Speed up vs (1) of ~18%.==
instead of ismember
, since you're comparing integers I would expect ==
to be faster than ismember
. Speed up vs (1) of ~62%.==
or abs(..)<tol
approaches are both similarly fast so I've only shown the latter. Speed up vs (1) of ~94%.Results for test case with length(A) = 1e6
and 100 columns in B
:
With loop and ismember: 0.65sec
With preallocated loop and ismember: 0.53sec
With preallocated loop and ==: 0.25sec
With no loop and ==: 0.04sec
Your mileage may vary on these time savings depending on input matrix sizes, expected number of matches, etc. Full code below for you to test other cases.
Code:
N = 1e6;
A = (1:N).';
B = randi([1,N],N,1e2);
t = [
timeit( @() withloop(A,B) )
timeit( @() withloop_prealloc(A,B) )
timeit( @() withloop_prealloceq(A,B) )
timeit( @() noloop(A,B) )
];
fprintf( 'With loop and ismember: %.2fsec\n', t(1) );
fprintf( 'With preallocated loop and ismember: %.2fsec\n', t(2) );
fprintf( 'With preallocated loop and ==: %.2fsec\n', t(3) );
fprintf( 'With no loop and ==: %.2fsec\n', t(4) );
function withloop(A,B)
for i=1:1:length(A)
C(i) = sum(ismember(B(i,:),A(i)),2);
end
end
function withloop_prealloc(A,B)
C = NaN(length(A),1);
for i=1:1:length(A)
C(i) = sum(ismember(B(i,:),A(i)),2);
end
end
function withloop_prealloceq(A,B)
C = NaN(length(A),1);
for i=1:1:length(A)
C(i) = sum(B(i,:) == A(i),2);
end
end
function noloop(A,B)
C = sum( abs(A-B)<1e-8, 2 );
end