apache-sparkpysparkjvm

Why not set spark.memory.fraction to 1.0?


I am confused as to why Spark only uses a fraction of the Java heap ? Why not just keep it 100% or set it spark.memory.fraction to 1.0.

What is the point of keeping 0.4 (by default) ? Why keep this memory unutilized ?

Is this used by Spark or used by the JVM ?


Solution

  • From the docs,

    spark.memory.fraction expresses the size of M as a fraction of the (JVM heap space - 300MiB) (default 0.6). The rest of the space (40%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records.