Getting the following error while connecting to TIDB
through Spark
using mysql-connector-java 5.1.6 connector
.
Please note, I have created jdbc connection using the parallel connections option, where we specify the column name, lower bound, upper bound and the number of partitions.
Spark then breaks it into (number of partitions) queries, by dividing the lower bound and upper bound of the column name into equal sizes.
java.sql.SQLException: other error: request outdated.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3536)
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:1551)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1407)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2861)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:474)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2554)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1755)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2165)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2648)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2086)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2237)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:301)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1092)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
other error: request outdated.
is an error thrown by TiKV
, which indicates the query exceeds the timeout limit end-point-request-max-handle-duration
before executing, and canceled by coprocessor to avoid staled query. You may configure it in TiKV's config, its default value is 60 seconds.
Since Spark retrieves this error from JDBC, it means that the requests are too heavy for TiDB
to handle so that some requests waited too long. That is mainly because Spark splits requests for each partition, resulting in heavy workload for TiDB. It makes matter worse while you are using parallel connection.
As a matter of fact, TiSpark is developed to be a solution for querying with Spark along with TiDB. It now supports Spark 2.1 and will support Spark 2.3 in a few days. Try it!