apache-sparkpysparkapache-spark-sqlgraphframes

How to make GraphFrame from Edge DataFrame only


From this, "A GraphFrame can also be constructed from a single DataFrame containing edge information. The vertices will be inferred from the sources and destinations of the edges."

However when I look into its API doc, it seems there is no way to create one.

Has someone tried to create a GraphFrame using edge DataFrame only? How?


Solution

  • The graphframes scala API has a function called fromEdges which generates a graphframe from a edge dataframe. As far as I can overlook it this function isn't avaiable in pyspark, but you can do something like:

    ##something
    
    verticesDf = edgesDF.select('src').union(edgesDF.select('dst'))
    verticesDf = verticesDf.withColumnRenamed('src', 'id')
    
    ##more something
    

    to achieve the same.