I have an undirected graph of N nodes and L links, whose structure is typically described by its adjacency matrix A with N rows and N columns. In this adjacency matrix, the value '1' denotes a link between two nodes and, conversely, '0' denotes no links.
I work in Matlab and want to avoid all the sophistications provided by the built-in graph()
functions such as N = neighbors(G, NODEID)
where G
is G = graph(adjm);
and adjm
the adjacency matrix.
In this graph, I need to identify which unique nodes (neighbors) are connected to one or several starting nodes, labelled idx
.
I'm currently using the following line of code:
fidx = unique(ceil(find(adjm(idx,:)) / numel(idx)));
where fidx
contains the neighbors (connected nodes) of a set of idx
starting nodes.
This operation is repeated a large number of times for different nodes and after some profiling, I found that the instruction find(adjm(idx,:))
is really hindering the overall runtime. My first attempt was to try to avoid find()
. For example using:
A = adjm(idx,:);
A = A(:) == 1;
V = 1:size(A,1);
idx = unique(ceil(V(A) / numel(idx)));
However, this still remains too slow for my application.
What are you suggestions to reduce the runtime associated with this specific section of code?
Here is a minimal example:
%Minimal example:
f = @() get_neighb();
timeit(f)
function get_neighb()
%generate a graph with a large number of nodes N, in order to have
%computations times large enough:
N = 2e4; %number of nodes
G = round( rand(N) / 1.5 );
G = triu(G) + triu(G,1)';
adjm = G - diag(diag(G));
%for visualisation purposes:
%plot(graph(adjm));
s = size(adjm,1);
%list of some nodes to search for (randomly chosen):
n_nodes_test = 2000; %number of couples to test
%In this minimal example, we focus on searching for the neighbors of each couple (2-tuple) of
%nodes only. But 'snodes' can be single nodes or n-tuples, i.e. containing more
%than 2 nodes
snodes = [ round(linspace(1, s/2, n_nodes_test)); round(linspace(s/2, s, n_nodes_test))]';
%This section is the critical one:
for i=1:n_nodes_test
idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
%fidx = unique(ceil(find(adjm(idx,:)) / numel(idx)));
%can be replaced by the following:
A = adjm(idx,:);
A = A(:) == 1;
V = 1:size(A,1);
idxf = unique(ceil(V(A) / numel(idx)));
%which is similar to (using Matlab built-in 'graph' functions):
% G = graph(adjm);
% T = unique([neighbors(G, snodes(1,1)); neighbors(G, snodes(1,2))]);
% here idxf' == T
end
end
Here is a short benchmark of the proposed solutions. There was several suggestions on how accurately measuring the runtime using timeit()
, or profiler
. In this benchmark we stick to the profiler for measuring the runtime for each function find_neighb_v()
. In the minimal example posted above, about 80% of the runtime was dedicated to the construction of the graph adjm
. However, we aim at measuring the fastest way to retrieve node's neighbors, and we don't want to measure the time for the graph creation.
The Matlab built-in suite graph()
provides practical and efficient functions for returning the neighbors of a node. However, for each direct modification of the adjacency matrix adjm
, one need to update the graph() structure with a call to graph(G)
, which can be really time consumming, especially for large networks. It is possible to directly modify the underlying Node and Edge Lists of the graph() structure, but for people who directly work on the topology of the graph using the adjacency matrix properties -using spectral analyse for instance- this is not possible. Here is also presented the runtime of the graph() function for comparison purpose.
%Benchmarck using profiler:
%generate a graph with a large number of nodes N, in order to have
%computations times large enough:
N = 1.5e4; %number of nodes
G = round( rand(N) / 1.5 );
G = triu(G) + triu(G,1)';
adjm = G - diag(diag(G));
%for visualisation purposes:
%plot(graph(adjm));
s = size(adjm,1);
%list of some nodes to search for (randomly chosen):
n_nodes_test = 2000; %number of couples to test
%In this minimal example, we focus on searching for the neighbors of each couple (2-tuple) of
%nodes only. But 'snodes' can be single nodes or n-tuples, i.e. containing more
%than 2 nodes
snodes = [ round(linspace(1, s/2, n_nodes_test)); round(linspace(s/2, s, n_nodes_test))]';
G_test = graph(adjm);
profile on
%1st version, using find()
find_neighb_v1(adjm, snodes, n_nodes_test);
%2nd version, without find()
find_neighb_v2(adjm, snodes, n_nodes_test);
%3rd version, using any() and find()
find_neighb_v3(adjm, snodes, n_nodes_test);
%base version, using built-in graph() and neighbors-) functions
find_neighb_base(G_test, snodes, n_nodes_test);
profile viewer
function find_neighb_v1(adjm, snodes, n_nodes_test)
for i=1:n_nodes_test
idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
idxf = unique(ceil(find(adjm(idx,:)) / numel(idx)));
end
end
function find_neighb_v2(adjm, snodes, n_nodes_test)
for i=1:n_nodes_test
idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
A = adjm(idx,:);
A = A(:) == 1;
V = 1:size(A,1);
idxf = unique(ceil(V(A) / numel(idx)));
end
end
function find_neighb_v3(adjm, snodes, n_nodes_test)
for i=1:n_nodes_test
idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
idxf = find(any(adjm(idx, :)));
end
end
function find_neighb_base(G, snodes, n_nodes_test)
for i=1:n_nodes_test
idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
idxf = unique([neighbors(G, idx(1)); neighbors(G, idx(2))]);
end
end
find_neighb_v1 : 1.437s
find_neighb_v2 : 1.314s
find_neighb_v3 : 0.995s
find_neighb_base : 0.666s
and demonstrate that the proposed solution (v3) is more efficient, yet hindered by find()
.
You can use any
, so the code will be reduced to:
idxf = find(any(adjm(idx, :)));
Explanation:
Assume that you have two logical row vectors A
and B
and you want to find a vector C
that contains 1
where any of elements of A
or B
are 1
. In other words we want to find the unique indices of A
or B
where they that contain 1
. We simply can use logical OR
operator:
C = A | B;
Equivalently you can use any
function to compute C
:
C = any([A;B]);
When we use idx = find(C)
the indices will be unique and there is no need to use the unique
function.
A second solution
idxf = find(idx * adjm);
Here we can use vector by matrix multiplication idx * adjm
that is equivalent to sum(adjm(idx, :))
. However in this case the index vector idx
is assumed to be a logical vector that has the same length as the column of adjm
.