pythonpysparkpytestapache-sedona

Pytest showing error if path to test is not specified


I have a conftest.py inside my tests/ folder which contains a fixture with a spark context as follows:

import pytest
from pyspark import SparkConf
from sedona.utils import SedonaKryoRegistrator, KryoSerializer
from sedona.register import SedonaRegistrator 
from pyspark.sql import SparkSession

@pytest.fixture
def spark_session_sedona():
    parameters = {'spark.driver.maxResultSize': '3g', 'spark.hadoop.fs.s3a.impl': 'org.apache.hadoop.fs.s3a.S3AFileSystem', 'spark.sql.execution.arrow.pyspark.enabled': True, 'spark.scheduler.mode': 'FAIR'}
    spark_conf = SparkConf().setAll(parameters.items())
    spark_session_conf = (
            SparkSession.builder.appName('appName')
            .enableHiveSupport()
            .config('spark.jars.packages', 'org.apache.hadoop:hadoop-common:3.3.4,'
                                            'org.apache.hadoop:hadoop-azure:3.3.4,'
                                            'com.microsoft.azure:azure-storage:8.6.6,'
                                            'io.delta:delta-core_2.12:1.0.0,'
                                            'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.2.1-incubating,'
                                            'org.datasyslab:geotools-wrapper:1.3.0-27.2')
            .config(conf=spark_conf)
            .config("spark.serializer", KryoSerializer.getName)
            .config("spark.kryo.registrator", SedonaKryoRegistrator.getName)
        )
    return spark_session_conf.getOrCreate()

Then, I have a test that has this fixture as param and basically executes:

SedonaRegistrator.registerAll(spark)

When I execute the command

pytest

it returns the error:

TypeError: 'JavaPackage' object is not callable

However, if I execute:

pytest src/tests/test_sedona.py

it passes the test without any issue.

Does anybody know what's going on?

Full error:

src/tests/utils/test_lanes_scale.py:39: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/adas_lanes/utils/lanes_scale.py:112: in lanes_sql_line
    SedonaRegistrator.registerAll(spark)
/home/vscode/.local/lib/python3.8/site-packages/sedona/register/geo_registrator.py:43: in registerAll
    cls.register(spark)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'sedona.register.geo_registrator.SedonaRegistrator'>
spark = <pyspark.sql.session.SparkSession object at 0x7f847a72f340>

    @classmethod
    def register(cls, spark: SparkSession):
>       return spark._jvm.SedonaSQLRegistrator.registerAll(spark._jsparkSession)
E       TypeError: 'JavaPackage' object is not callable

Solution

  • PysparkSession behaves as a singleton instance model so if somewhere in your code or in another test you have an initialization with SparkSession.builder.getOrCreate() probably when you try to create the new one with Sedona instead of creating one it will retrieve the one created without Sedona.

    When you run the isolated test the SparkSession that is created has the Apache Sedona configuration. On the other hand when you run the full suite the spark session is most likely initialized in another test case or code step.

    To get the fixture to run at the beginning of the test suite and the rest of the SparkSession.builder.getOrCreate() use the Sedona config you can add the following to the annotator:

    @pytest.fixture(scope="session", autouse=True)
        def spark_session_sedona():