javascalaapache-sparkpysparkspark-graphx

Partitioning Strategy For Complete Graph In Spark GraphX


I have created a graph using Spark graphX in which every vertex is directly connected to every other vertex of graph i.e Complete graph. Please if anyone can suggest good partitioning strategy for this type of situation or any ideas to implement custom partition strategy.

I have 1 million vertices and 500 million edges.

Any ideas or suggestions related to this will be greatly appreciated. Thanks in advance.


Solution

  • If you have a complete graph, you do not have to care about sophisticated partitioning algorithms. Just take the random partitioning method which is already implemented by GraphX.

    If you have n graph vertices and k partitions, any balanced (edge-cut) partitioning strategy will assign about n/k vertices to each partition which results in (n-n/k) outgoing edges per partition to other partitions: each vertex is connected to each other vertex on each other partition.