apache-sparkpysparkgeospatialapache-sedona

Apache Sedona Version Issues


So I'm trying to set up Apache Sedona but running into strange issues that suggest that the version compatibilities are off. For context, I have Apache version 1.5.1, PySpark version 3.2.1, and Scala 2.12.18.

I installed the below packages using maven.

I'm trying to run this code

from sedona.spark import *

spark = SedonaContext.builder().\
    config('spark.jars.packages',
           'org.apache.sedona:sedona-spark-3.4_2.12:1.5.1,'
           'org.datasyslab:geotools-wrapper:1.5.1-28.2,'
           'uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4,'
           'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.4.1'). \
    config('spark.jars.repositories', 'https://artifacts.unidata.ucar.edu/repository/unidata-all'). \
    getOrCreate()

sedona = SedonaContext.create(spark)

according to their example notebook https://github.com/apache/sedona/blob/master/binder/ApacheSedonaSQL.ipynb, but also making sure to add in the Python Adapter.

But I get this error

Py4JJavaError: An error occurred while calling o206.showString.
: java.lang.NoSuchMethodError: 'double org.locationtech.jts.geom.Coordinate.getZ()'
    at org.apache.sedona.common.geometrySerde.GeometrySerializer.getCoordinateType(GeometrySerializer.java:449)
    at org.apache.sedona.common.geometrySerde.GeometrySerializer.serializePoint(GeometrySerializer.java:112)
    at org.apache.sedona.common.geometrySerde.GeometrySerializer.serialize(GeometrySerializer.java:43)
    at org.apache.sedona.sql.utils.GeometrySerializer$.serialize(GeometrySerializer.scala:36)
    at org.apache.spark.sql.sedona_sql.expressions.implicits$GeometryEnhancer.toGenericArrayData(implicits.scala:139)
    at org.apache.spark.sql.sedona_sql.expressions.InferredTypes$.$anonfun$buildSerializer$1(InferredExpression.scala:155)
    at org.apache.spark.sql.sedona_sql.expressions.InferredExpression.eval(InferredExpression.scala:71)
    at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:477)
    at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:69)

which seems like the Geo tools are not working. I can load a regular dataframe though, just can't do geospatial operations on them. What's the issue here?


Solution

  • Since Apache Sedona 1.5.0, we no longer release the Python-adapter because we merged all of those to the sedona-spark jar. If you put the old Python adapter jar from the old version, it will mess up with the dependency.

    The correct jar you need is the jars mentioned in our doc and notebooks.

    from sedona.spark import *
    
    spark = SedonaContext.builder().\
        config('spark.jars.packages',
               'org.apache.sedona:sedona-spark-3.4_2.12:1.5.1,'
               'org.datasyslab:geotools-wrapper:1.5.1-28.2,'
               'uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4'). \
        config('spark.jars.repositories', 'https://artifacts.unidata.ucar.edu/repository/unidata-all'). \
        getOrCreate()
    
    sedona = SedonaContext.create(spark)
    

    Please see our documentation: https://sedona.apache.org/1.5.1/setup/install-python/#prepare-sedona-spark-jar

    Please also make sure your Spark version matches the one mentioned in Sedona's maven coordinate. The one used in the example is for Spark 3.4