scalaapache-sparkspark-graphx

How can I load weighted graphs in scala?


It seems that there is no built-in way in graphx to load weighted graphs properly. I have a file with columns representing edges of graph:

# source_id target_id weight
0   1   1
1   2   2
2   3   3
3   4   4
4   5   5
5   0   6

How can I load it into graphx.Graph correctly?


Solution

  • I'm not familiar with GraphX, but here's a manual approach to do this. It's a bit ugly, but it does the job anyway. I assigned an attribute "name" to each vertex, but you can adjust it as you wish.

    import org.apache.spark.graphx._
    
    val input = sc.textFile("edgefile.txt")
    val header = input.first()
    val rdd = input.filter(row => row != header).map(_.split("   ").map(_.toLong))
    val edges = rdd.map(s => Edge(s(0), s(1), s(2)))
    val vertices = rdd.map(r => r(0)).union(rdd.map(r => r(1))).distinct.map(r => (r, "name"))
    val graph = Graph(vertices, edges)
    
    graph.vertices.foreach(println)
    (3,name)
    (1,name)
    (2,name)
    (0,name)
    (4,name)
    (5,name)
    
    graph.edges.foreach(println)
    Edge(0,1,1)
    Edge(1,2,2)
    Edge(2,3,3)
    Edge(3,4,4)
    Edge(4,5,5)
    Edge(5,0,6)