I have a conftest.py inside my tests/ folder which contains a fixture with a spark context as follows:
import pytest
from pyspark import SparkConf
from sedona.utils import SedonaKryoRegistrator, KryoSerializer
from sedona.register import SedonaRegistrator
from pyspark.sql import SparkSession
@pytest.fixture
def spark_session_sedona():
parameters = {'spark.driver.maxResultSize': '3g', 'spark.hadoop.fs.s3a.impl': 'org.apache.hadoop.fs.s3a.S3AFileSystem', 'spark.sql.execution.arrow.pyspark.enabled': True, 'spark.scheduler.mode': 'FAIR'}
spark_conf = SparkConf().setAll(parameters.items())
spark_session_conf = (
SparkSession.builder.appName('appName')
.enableHiveSupport()
.config('spark.jars.packages', 'org.apache.hadoop:hadoop-common:3.3.4,'
'org.apache.hadoop:hadoop-azure:3.3.4,'
'com.microsoft.azure:azure-storage:8.6.6,'
'io.delta:delta-core_2.12:1.0.0,'
'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.2.1-incubating,'
'org.datasyslab:geotools-wrapper:1.3.0-27.2')
.config(conf=spark_conf)
.config("spark.serializer", KryoSerializer.getName)
.config("spark.kryo.registrator", SedonaKryoRegistrator.getName)
)
return spark_session_conf.getOrCreate()
Then, I have a test that has this fixture as param and basically executes:
SedonaRegistrator.registerAll(spark)
When I execute the command
pytest
it returns the error:
TypeError: 'JavaPackage' object is not callable
However, if I execute:
pytest src/tests/test_sedona.py
it passes the test without any issue.
Does anybody know what's going on?
Full error:
src/tests/utils/test_lanes_scale.py:39:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/adas_lanes/utils/lanes_scale.py:112: in lanes_sql_line
SedonaRegistrator.registerAll(spark)
/home/vscode/.local/lib/python3.8/site-packages/sedona/register/geo_registrator.py:43: in registerAll
cls.register(spark)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'sedona.register.geo_registrator.SedonaRegistrator'>
spark = <pyspark.sql.session.SparkSession object at 0x7f847a72f340>
@classmethod
def register(cls, spark: SparkSession):
> return spark._jvm.SedonaSQLRegistrator.registerAll(spark._jsparkSession)
E TypeError: 'JavaPackage' object is not callable
PysparkSession behaves as a singleton instance model so if somewhere in your code or in another test you have an initialization with SparkSession.builder.getOrCreate()
probably when you try to create the new one with Sedona instead of creating one it will retrieve the one created without Sedona.
When you run the isolated test the SparkSession that is created has the Apache Sedona configuration. On the other hand when you run the full suite the spark session is most likely initialized in another test case or code step.
To get the fixture to run at the beginning of the test suite and the rest of the SparkSession.builder.getOrCreate()
use the Sedona config you can add the following to the annotator:
@pytest.fixture(scope="session", autouse=True)
def spark_session_sedona():