The spark cluster (spark 2.2) is used by around 30 people via spark-shell and tableau (10.4). Once a day the thriftserver gets killed or freezes because the jvm has to many garbage to collect. These are the error messages that I can find in the thriftserver log file:
ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR TaskSchedulerImpl: Lost executor 2 on XXX.XXX.XXX.XXX: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Exception in thread "HiveServer2-Handler-Pool: Thread-152" java.lang.OutOfMemoryError: Java heap space
General information:
The Thriftserver is started with the following options (copied from the web-ui of the master -> sun.java.command):
org.apache.spark.deploy.SparkSubmit --master spark://bd-master:7077 --conf spark.driver.memory=6G --conf spark.driver.extraClassPath=--hiveconf --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --executor-memory 12G --total-executor-cores 12 --supervise --driver-cores 2 spark-internal hive.server2.thrift.bind.host bd-master --hiveconf hive.server2.thrift.port 10001
The spark standalone cluster has 48 cores and 240 GB memory at 6 machines. Every machine has 8 Cores and 64 GB memory. Two of them are virtual machines.
The users are querying a hive table which is a 1.6 GB csv file replicated on all machines.
Is there something I have done wrong why tableau is able to kill the thriftserver? Is there any other information I could provide that helps you to help me?
We are able to bypass this issue by setting:
spark.sql.thriftServer.incrementalCollect=true
With this parameter set to true, the thriftserver will send a result to the requester for every partition. This reduces the peak of memory the thriftserver needs when the thriftserver is going to send the result.