I'm trying to read some data from an Apache Ignite table with PySpark.
spark.read.format("jdbc").option("driver", "org.apache.ignite.IgniteJdbcThinDriver")\
.option("url", "jdbc:ignite:thin://172.19.0.1:10800;schema=fs_dev").option("dbtable", "country").load().show()
But it gives me an error:
java.sql.SQLException: Fetch size must be greater than zero.
at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.setFetchSize(JdbcThinStatement.java:620)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:302)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Can I somehow fix it?
P.S. I'm using Python 3.7 and PySpark 2.4.8
Well, I've added an .option("fetchSize", "100000")
to spark.read.format("jdbc")
spark.read.format("jdbc").option("driver", "org.apache.ignite.IgniteJdbcThinDriver").option("url", f"jdbc:ignite:thin://{host}:{port};schema={schema}").option("dbtable", "country").option("fetchSize", "100000").load()