[SOLVED] Spark Thriftserver stops or freezes due to tableau queries

Spark Thriftserver stops or freezes due to tableau queries

The spark cluster (spark 2.2) is used by around 30 people via spark-shell and tableau (10.4). Once a day the thriftserver gets killed or freezes because the jvm has to many garbage to collect. These are the error messages that I can find in the thriftserver log file:

ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, java.lang.OutOfMemoryError: GC overhead limit exceeded

ERROR TaskSchedulerImpl: Lost executor 2 on XXX.XXX.XXX.XXX: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Exception in thread "HiveServer2-Handler-Pool: Thread-152" java.lang.OutOfMemoryError: Java heap space

General information:

The Thriftserver is started with the following options (copied from the web-ui of the master -> sun.java.command):

org.apache.spark.deploy.SparkSubmit --master spark://bd-master:7077 --conf spark.driver.memory=6G --conf spark.driver.extraClassPath=--hiveconf --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --executor-memory 12G --total-executor-cores 12 --supervise --driver-cores 2 spark-internal hive.server2.thrift.bind.host bd-master --hiveconf hive.server2.thrift.port 10001

The spark standalone cluster has 48 cores and 240 GB memory at 6 machines. Every machine has 8 Cores and 64 GB memory. Two of them are virtual machines.

The users are querying a hive table which is a 1.6 GB csv file replicated on all machines.

Is there something I have done wrong why tableau is able to kill the thriftserver? Is there any other information I could provide that helps you to help me?

Solution

We are able to bypass this issue by setting:

spark.sql.thriftServer.incrementalCollect=true

With this parameter set to true, the thriftserver will send a result to the requester for every partition. This reduces the peak of memory the thriftserver needs when the thriftserver is going to send the result.