I'm using Spark in a YARN cluster (HDP 2.4) with the following settings:
When I run my spark application with the command spark-submit --num-executors 10 --executor-cores 1 --executor-memory 5g ...
Spark should give each executor 5 GB of RAM right (I set memory only to 5g due to some overhead memory of ~10%).
But when I had a look in the Spark UI, I saw that each executor only has 3.4 GB of memory, see screenshot:
Can someone explain why there's so less memory allocated?
The storage memory column in the UI displays the amount of memory used for execution and RDD storage. By default, this equals (HEAP_SPACE - 300MB) * 75%. The rest of the memory is used for internal metadata, user data structures and other stuffs.
You can control this amount by setting spark.memory.fraction
(not recommended). See more in Spark's documentation