pythonpandasapache-sparkgraphframes

Convert GraphFrame output to a pandas DataFrame


I checked multiple sources but couldn't pinpoint this particular problem although it probably has a very easy fix.

Let's say I have some graph, g.

I am able to print the vertices using g.vertices.show()

But I'm having a lot of trouble figuring out how to load all the vertices into a dataframe of some sort. I want to do a variety of tasks that are well supported on Pandas. Does anyone have a way to do this?


Solution

  • Just like how .show() will display the results of any query, you can do .toPandas() which will convert the output to a pandas DataFrame. As far as I can tell, this command couples any command that you can couple .show() with.

    So for my specific question:

    g.vertices.toPandas() solves the problem.