apache-sparkgoogle-cloud-dataproc

spark.sql.shuffle.partitions - default value


As per the doc https://cloud.google.com/dataproc/docs/support/spark-job-tuning#:~:text=spark.-,sql.,less%20than%20100%20vCPUs%20total., spark.sql.shuffle.partitions has default value of 200.

Is default value of 200 at each worker node level or overall cluster level? for ex: if we have 1 drive and 5 worker, what would be the default value, is it 200 or 200*5 worker=1000


Solution

  • spark.sql.shuffle.partitions (default 200) is at cluster level, regardless of the number of workers.