apache-sparkhadoop-yarnresourcemanager

Yarn UI "Application Memory MB" almost doubled from spark-submit configuration memory settings


We are running a Cloudera cluster (CDP) with Spark on YARN. The Hadoop version is 3.1.1, Spark version is 2.4.8, and CDP version is 7.2.16.

In the YARN UI, when we list all running applications, we notice that our jobs are consuming approximately (2GB * num_containers) in surplus of allocated memory. Here's an example:

enter image description here

The Spark streaming application in the picture is configured as follows:

spark.driver.memory                 3g
spark.executor.cores                2
spark.executor.instances            4
spark.executor.memory               4g
spark.deployMode                    cluster
spark.dynamicAllocation.enabled     false

Some YARN configurations (though not all may be relevant in this case) include:

yarn.scheduler.minimum-allocation-mb   2GB
yarn.app.mapreduce.am.resource.mb      3 GiB
mapreduce.map.memory.mb                3 GiB
mapreduce.reduce.memory.mb             3 GiB
node_manager_java_heapsize             2 GiB
resource_manager_java_heapsize         1 GiB

According to our calculations, the application should request a total of (4 * (4GB + 300MB)) + (3GB + 300MB) = 17.2 + 3.3 = 20.5 GB. However, in the YARN UI, it shows 28672 MB of "allocated memory MB".

*300MB is reserved memory ( or 350... but still far from surplus)

Here are some examples of other jobs and their surplus memory allocations:


Job 1:
- Allocated memory MB: 8192
- Driver memory: 2GB
- Executor memory: 2GB
- Executor instances: 1
- Num containers: 2
- Surplus: 4GB (2 * containers)

Job 2:
- Allocated memory MB: 4096
- Driver memory: 1GB
- Executor memory: 1GB
- Executor instances: 1
- Num containers: 2
- Surplus: 2GB (1 * containers)

Job 3:
- Allocated memory MB: 6144
- Driver memory: 2GB
- Executor memory: 1GB
- Executor instances: 1
- Num containers: 2
- Surplus: 3GB (1.5 * containers (?))

Job 4:
- Allocated memory MB: 12288
- Driver memory: 2GB
- Executor memory: 2GB
- Executor instances: 2
- Num containers: 3
- Surplus: 6GB (2 * containers)

Job 5:
- Allocated memory MB: 12288
- Driver memory: 3GB
- Executor memory: 3GB
- Executor instances: 3
- Num containers: 3
- Surplus: 3GB (1 * containers)

What could cause this increment in the job memory allocation? We've searched the net, the documentation, GPT/Copilot, but none of these resources have helped us find this surplus of allocated memory. Most suggestions indicate small increases, but not a doubling one.


Solution

  • Since you have yarn.scheduler.minimum-allocation-mb=2GB, each YARN container that runs Spark executor is allocated 6GB memory (4.4GB rounded up to the next N x 2GB boundary) and container for the driver - 4GB. So, 6GB x 4 + 4GB = 28GB total.

    BTW, memory overhead allocated by Spark is not ~300-350 MB, but rather extra 10% or 384MB, whichever is greater, by default.

    Yes, this also means that you're wasting (6GB - 4GB - 0.10 x 4 GB) = 1.6GB of memory per executor container.