I'm having some troubles on creating a relationship-based program, what I need is to create a function to aggregate related nodes together, like this, if I have those 4 nodes:
If A -> B, B -> C, C -> D, A -> C, B -> D, like this: nodes
When A -> D, I'd like to create an intermediate Node, replacing all relationships between those nodes with a single connection from each node to this new one. Like this:
@ <- cluster
A ->@, B -> @, C -> @, D -> @
So whoever is connected to @, has a relationship with all of its members
This to avoid 5x5, 10x10, 25x25 relationships, which is really heavy for memory.
I've achieved this goal in several steps, first I calculate common nodes between the twos who must be connected, if those common nodes are > 5, then I cluster them in a group and delete their connections.
Also when a connection is removed, meaning that under a certain lower bound, the cluster must be dismantled and all single relationships between nodes must be recreated,
If common users are > 100 this approach could be dangerous.
I'd like to know if someone could help me to do such things, as the creation of the cluster or dismantling it in reasonable time, because now my technique is O(n) and is very resource consuming
Thank you in advance
I am thinking that what you call a 'cluster' is what is called in graph theory a 'strongly connected component in a direct graph' https://en.wikipedia.org/wiki/Strongly_connected_component
You ask for a better performance than O(n) but you do not say what n is. Perhaps it is the number of nodes? The best algorithm for finding strongly connected components is O(V+E) where V is the number of nodes and E is the number of connections. So what you ask for is impossible.