I'm trying to use the graphframes library with pySpark v3.0.1. (I'm using vscode on debian but trying to import the package from pyspark shell didn't work either)
According to the documentation, using $ pyspark --packages graphframes:graphframes:0.6.0-spark2.3-s_2.11
I should be able to work with it.
This sample code was taken from another post in StackOverflow posing the same question, although its solution didn't do the trick for me.
localVertices = [(1,"A"), (2,"B"), (3, "C")]
localEdges = [(1,2,"love"), (2,1,"hate"), (2,3,"follow")]
v = sqlContext.createDataFrame(localVertices, ["id", "name"])
e = sqlContext.createDataFrame(localEdges, ["src", "dst", "action"])
g = GraphFrame(v, e)
throws error
py4j.protocol.Py4JJavaError: An error occurred while calling o63.createGraph.
java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps scala.Predef$.refArrayOps(java.lang.Object[])'
You need to use the correct graphframes version for Spark 3.0. You have used the graphframes for Spark 2.3 (0.6.0-spark2.3-s_2.11), which caused a Spark version conflict. You can try 0.8.1-spark3.0-s_2.12, which is currently the latest version of graphframes for Spark 3.0.
pyspark --packages graphframes:graphframes:0.8.1-spark3.0-s_2.12