It seems that there is no built-in way in graphx
to load weighted graphs properly. I have a file with columns representing edges of graph:
# source_id target_id weight
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 0 6
How can I load it into graphx.Graph
correctly?
I'm not familiar with GraphX, but here's a manual approach to do this. It's a bit ugly, but it does the job anyway. I assigned an attribute "name" to each vertex, but you can adjust it as you wish.
import org.apache.spark.graphx._
val input = sc.textFile("edgefile.txt")
val header = input.first()
val rdd = input.filter(row => row != header).map(_.split(" ").map(_.toLong))
val edges = rdd.map(s => Edge(s(0), s(1), s(2)))
val vertices = rdd.map(r => r(0)).union(rdd.map(r => r(1))).distinct.map(r => (r, "name"))
val graph = Graph(vertices, edges)
graph.vertices.foreach(println)
(3,name)
(1,name)
(2,name)
(0,name)
(4,name)
(5,name)
graph.edges.foreach(println)
Edge(0,1,1)
Edge(1,2,2)
Edge(2,3,3)
Edge(3,4,4)
Edge(4,5,5)
Edge(5,0,6)