performancematlabgraph-theoryundirected-graph

Efficient method for finding unique neighbors of nodes in an undirected graph without using Matlab built-in functions


I have an undirected graph of N nodes and L links, whose structure is typically described by its adjacency matrix A with N rows and N columns. In this adjacency matrix, the value '1' denotes a link between two nodes and, conversely, '0' denotes no links.

I work in Matlab and want to avoid all the sophistications provided by the built-in graph() functions such as N = neighbors(G, NODEID) where G is G = graph(adjm); and adjm the adjacency matrix.

In this graph, I need to identify which unique nodes (neighbors) are connected to one or several starting nodes, labelled idx. I'm currently using the following line of code:

fidx = unique(ceil(find(adjm(idx,:)) / numel(idx)));

where fidx contains the neighbors (connected nodes) of a set of idx starting nodes. This operation is repeated a large number of times for different nodes and after some profiling, I found that the instruction find(adjm(idx,:)) is really hindering the overall runtime. My first attempt was to try to avoid find(). For example using:

A = adjm(idx,:);
A = A(:) == 1;
V = 1:size(A,1);
idx = unique(ceil(V(A) / numel(idx)));

However, this still remains too slow for my application.

What are you suggestions to reduce the runtime associated with this specific section of code?

Here is a minimal example:

%Minimal example:
f = @() get_neighb();
timeit(f)

function get_neighb()
%generate a graph with a large number of nodes N, in order to have
%computations times large enough:
N = 2e4; %number of nodes
G = round( rand(N) / 1.5 );
G = triu(G) + triu(G,1)';
adjm = G - diag(diag(G));

%for visualisation purposes:
%plot(graph(adjm));

s = size(adjm,1);

%list of some nodes to search for (randomly chosen):
n_nodes_test = 2000; %number of couples to test
%In this minimal example, we focus on searching for the neighbors of each couple (2-tuple) of
%nodes only. But 'snodes' can be single nodes or n-tuples, i.e. containing more
%than 2 nodes
snodes = [ round(linspace(1, s/2, n_nodes_test)); round(linspace(s/2, s, n_nodes_test))]';

%This section is the critical one:
for i=1:n_nodes_test
    
    idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
    
    %fidx = unique(ceil(find(adjm(idx,:)) / numel(idx)));
    %can be replaced by the following:
    A = adjm(idx,:);
    A = A(:) == 1;
    V = 1:size(A,1);
    idxf = unique(ceil(V(A) / numel(idx)));
    
    %which is similar to (using Matlab built-in 'graph' functions):
    % G = graph(adjm);
    % T = unique([neighbors(G, snodes(1,1)); neighbors(G, snodes(1,2))]);
    % here idxf' == T
end


end

Benchmark

Here is a short benchmark of the proposed solutions. There was several suggestions on how accurately measuring the runtime using timeit(), or profiler. In this benchmark we stick to the profiler for measuring the runtime for each function find_neighb_v(). In the minimal example posted above, about 80% of the runtime was dedicated to the construction of the graph adjm. However, we aim at measuring the fastest way to retrieve node's neighbors, and we don't want to measure the time for the graph creation.

A note on Graph(G)

The Matlab built-in suite graph() provides practical and efficient functions for returning the neighbors of a node. However, for each direct modification of the adjacency matrix adjm, one need to update the graph() structure with a call to graph(G), which can be really time consumming, especially for large networks. It is possible to directly modify the underlying Node and Edge Lists of the graph() structure, but for people who directly work on the topology of the graph using the adjacency matrix properties -using spectral analyse for instance- this is not possible. Here is also presented the runtime of the graph() function for comparison purpose.

The code

%Benchmarck using profiler:

%generate a graph with a large number of nodes N, in order to have
%computations times large enough:
N = 1.5e4; %number of nodes
G = round( rand(N) / 1.5 );
G = triu(G) + triu(G,1)';
adjm = G - diag(diag(G));

%for visualisation purposes:
%plot(graph(adjm));

s = size(adjm,1);

%list of some nodes to search for (randomly chosen):
n_nodes_test = 2000; %number of couples to test
%In this minimal example, we focus on searching for the neighbors of each couple (2-tuple) of
%nodes only. But 'snodes' can be single nodes or n-tuples, i.e. containing more
%than 2 nodes
snodes = [ round(linspace(1, s/2, n_nodes_test)); round(linspace(s/2, s, n_nodes_test))]';

G_test = graph(adjm);

profile on
%1st version, using find()
find_neighb_v1(adjm, snodes, n_nodes_test);
%2nd version, without find()
find_neighb_v2(adjm, snodes, n_nodes_test);
%3rd version, using any() and find()
find_neighb_v3(adjm, snodes, n_nodes_test);
%base version, using built-in graph() and neighbors-) functions
find_neighb_base(G_test, snodes, n_nodes_test);
profile viewer

function find_neighb_v1(adjm, snodes, n_nodes_test)
    for i=1:n_nodes_test
        idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
        
        idxf = unique(ceil(find(adjm(idx,:)) / numel(idx)));
    end
end

function find_neighb_v2(adjm, snodes, n_nodes_test)

    for i=1:n_nodes_test
        idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
    
        A = adjm(idx,:);
        A = A(:) == 1;
        V = 1:size(A,1);
        idxf = unique(ceil(V(A) / numel(idx)));
    end

end

function find_neighb_v3(adjm, snodes, n_nodes_test)

    for i=1:n_nodes_test
        idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
    
        idxf = find(any(adjm(idx, :)));
    end

end

function find_neighb_base(G, snodes, n_nodes_test)

    for i=1:n_nodes_test
        idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
        
        idxf = unique([neighbors(G, idx(1)); neighbors(G, idx(2))]);
    end

end

Results and conclusions

The results are obtained on a simple office computer:
find_neighb_v1 : 1.437s
find_neighb_v2 : 1.314s
find_neighb_v3 : 0.995s
find_neighb_base : 0.666s

and demonstrate that the proposed solution (v3) is more efficient, yet hindered by find().


Solution

  • You can use any, so the code will be reduced to:

    idxf = find(any(adjm(idx, :)));
    

    Explanation:
    Assume that you have two logical row vectors A and B and you want to find a vector C that contains 1 where any of elements of A or B are 1. In other words we want to find the unique indices of A or B where they that contain 1. We simply can use logical OR operator:

    C = A | B;
    

    Equivalently you can use any function to compute C:

    C = any([A;B]);
    

    When we use idx = find(C) the indices will be unique and there is no need to use the unique function.

    A second solution

    idxf = find(idx * adjm);
    

    Here we can use vector by matrix multiplication idx * adjm that is equivalent to sum(adjm(idx, :)). However in this case the index vector idx is assumed to be a logical vector that has the same length as the column of adjm.