apache-sparkpysparkgraphframes

Py4JJavaError: An error occurred while calling o65.createGraph


I wanted to install graphframes for spark following the instructions on the spark website, but the command:

pyspark --packages graphframes:graphframes:0.8.1-spark3.0-s_2.12

did not work for me.

I tried many ways to install, but decided to stay at downloading graphframes .jar, adding it to the general list of Spark .jar files and adding it manually in the code spark.sparkContext.addPyFile("path to /spark-2.4.7-bin-hadoop2.7/jars/graphframes-0.8.1-spark3.0-s_2.12.jar"). After that, the library is imported, but there is always an error when creating the GraphFrame. And I just have no idea how to solve it.

My .bashrc variables:
export CLASSPATH="/home/german/spark-2.4.7-bin-hadoop2.7/jars"
export HADOOP_CONF_DIR="/home/german/spark-2.4.7-bin-hadoop2.7/conf"
export HADOOP_HOME="/home/german/spark-2.4.7-bin-hadoop2.7"
export HADOOP_SECURITY_LOGGER=ERROR,console
export JAVA_HOME="/home/german/jdk1.8.0_301"
export SPARK_CLASSPATH="/home/german/spark-2.4.7-bin-hadoop2.7/jars"
export SPARK_DIST_CLASSPATH="/home/german/spark-2.4.7-bin-hadoop2.7/jars"
export SPARK_HOME="/home/german/spark-2.4.7-bin-hadoop2.7"
export PATH="/home/german/spark-2.4.7-bin-hadoop2.7/bin:$PATH"
export PYTHONPATH="/home/german/spark-2.4.7-bin-hadoop2.7/python/lib/pyspark.zip:/home/german/spark-2.4.7-bin-hadoop2.7/python/lib:/home/german/spark-2.4.7-bin-hadoop2.7/python:$PYTHONPATH"

My jdk version 1.8, python 3.7.10, OS: Ubuntu 20.04 LTS.

from pyspark.sql import SparkSession
    
spark = SparkSession.builder\
                    .config("spark.sql.warehouse.dir", "spark_warehouse")\
                    .getOrCreate()
spark.sparkContext.setCheckpointDir("graphframes_checkpoints")
spark.sparkContext.addPyFile("path to /spark-2.4.7-bin-hadoop2.7/jars/graphframes-0.8.1-spark3.0-s_2.12.jar")

vertices = spark.read.parquet("tmp_dfs/parquet/vertices.parquet")

edges = spark.read.parquet("tmp_dfs/parquet/edges.parquet")

from graphframes import *
graph = GraphFrame(vertices, edges)

And I get the error:

Py4JJavaError: An error occurred while calling o65.createGraph.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
    at org.graphframes.GraphFrame$.apply(GraphFrame.scala:676)
    at org.graphframes.GraphFramePythonAPI.createGraph(GraphFramePythonAPI.scala:10)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-2-ee0c1444db6f> in <module>
     37 
     38 from graphframes import *
---> 39 graph = GraphFrame(vertices, edges)

/tmp/spark-9d209109-e503-4ea1-813c-9ca68e76d72a/userFiles-4417833f-c19c-4e6e-9eea-7a21b6553f5f/graphframes-0.8.1-spark3.0-s_2.12.jar/graphframes/graphframe.py in __init__(self, v, e)
     87                 .format(self.DST, ",".join(e.columns)))
     88 
---> 89         self._jvm_graph = self._jvm_gf_api.createGraph(v._jdf, e._jdf)
     90 
     91     @property

~/spark-2.4.7-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

~/spark-2.4.7-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

~/spark-2.4.7-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(

Py4JJavaError: An error occurred while calling o65.createGraph.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
    at org.graphframes.GraphFrame$.apply(GraphFrame.scala:676)
    at org.graphframes.GraphFramePythonAPI.createGraph(GraphFramePythonAPI.scala:10)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

I may have chosen the wrong installation method or something else. I would be glad to hear any suggestions on how to solve this problem.


Solution

  • Check with which scala version spark jars are available under $SPARK_HOME/jars folder, example spark-sql_<scala version>-2.4.7.jar. If the version is 2.11 then you need to use graphframe which is compiled with scala v2.11.

    And one more thing, spark version which you are using is 2.4.7 but graphframes jar which you added is related to spark 3.0, this might also cause issues.