pysparktidb

PySpark JDBC Write to MySQL (TiDB)


I am trying to write a pyspark dataframe (Millions of row) to TIDB, (Spark 2.3)

df.write.format('jdbc').options(
  url='jdbc:mysql://<host>:<port>/<table>',
  driver='com.mysql.jdbc.Driver',
  dbtable='<tablename>',
  user='<username>',
  password='<password>',
  batchsize = 30000,
  truncate = True
).mode('overwrite').save()

However, all I keep getting is this error

Caused by: java.sql.BatchUpdateException: statement count 5001 exceeds the transaction limitation, autocommit = false
....
....
....
Caused by: java.sql.SQLException: statement count 5001 exceeds the transaction limitation, autocommit = false

Any idea how can I fix this?


Solution

  • You should add ?rewriteBatchedStatements=true to your JDBC URI so that the DML statements are batched. Not only the write will be faster but you wont hit your database transaction limit so easily.