apache-sparkhdfsbigdataavroavro-tools

Spark 2.4.1 can not read Avro file from HDFS


I have a simple code block to write then read dataframe as Avro format. As the Avro lib already built in Spark 2.4.x,

The Avro files writing went succeed and files are generated in HDFS. However AbstractMethodError exception is thrown when I read the files. Can anyone share me some light?

I used the Spark internal library by adding the package org.apache.spark:spark-avro_2.11:2.4.1 in my Zeppelin nodebook Spark interpreter.

My simple code block:

%pyspark

test_rows = [ Row(file_name = "test-guangzhou1", topic='camera1', timestamp=1, msg="Test1"),  Row(file_name = "test-guangzhou1", topic='camera1', timestamp=2, msg="Test2"), Row(file_name = "test-guangzhou3", topic='camera3', timestamp=3, msg="Test3"), Row(file_name = "test-guangzhou1", topic='camera1', timestamp=4, msg="Test4") ]

test_df = spark.createDataFrame(test_rows)

test_df.write.format("avro")
    .mode('overwrite').save("hdfs:///tmp/bag_parser279181359_3")

loaded_df =  spark.read.format("avro").load('hdfs:///tmp/bag_parser279181359_3')

loaded_df.show()

The error message I saw:

Py4JJavaError: An error occurred while calling o701.collectToPython.
: java.lang.AbstractMethodError
    at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:337)
    at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:331)
    at org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:357)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:137)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:133)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:161)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:158)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:133)
    at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:289)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:381)
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
    at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3259)
    at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3256)
    at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3373)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:79)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3367)
    at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3256)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while calling o701.collectToPython.\n', JavaObject id=o702), <traceback object at 0x7fc031b5c878>)

Solution

  • AbstractMethodError :

    Thrown when an application tries to call an abstract method. Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.

    AFAIK you have to investigate on what versions you have used to compile and run.