scalapysparkgoogle-cloud-dataprocdelta-lake

How to resolve: java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.types.StructType.toAttributes()'


Running a simple ETL PySpark job on Dataproc 2.2 with job property spark.jars.packages set to io.delta:delta-core_2.12:2.4.0 . Other settings are set to default. I have the following config:

conf = (
    SparkConf()
    .set(
        "spark.sql.extensions",
        "io.delta.sql.DeltaSparkSessionExtension",
    )
    .set(
        "spark.sql.catalog.spark_catalog",
        "org.apache.spark.sql.delta.catalog.DeltaCatalog",
    )
    .set(
        "spark.sql.parquet.enableVectorizedReader",
        "false",
    )
)
spark = SparkSession.builder.config(conf=conf).getOrCreate()

Getting the following error:

Traceback (most recent call last):
  File "/tmp/job-b0fc313a/historical.py", line 71, in <module>
    etl(args.source_uri, args.target_uri)
  File "/tmp/job-b0fc313a/historical.py", line 53, in etl
    hist_df.write.format("delta").mode("overwrite").save(target_uri)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1463, in save
  File "/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 179, in deco
  File "/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o89.save.
: java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.types.StructType.toAttributes()'

Tried changing the versions of io.delta:delta-core_2.x:x.x.0 to no avail. I've read that the problem stems from version incompatibility of Scala, but Dataproc 2.2 is running on Scala 2.12.


Solution

  • Changed the property spark.jars.packages from io.delta:delta-core_2.12:2.4.0 to io.delta:delta-spark_2.12:3.2.0.