
How to calculate Spark driver and executor memory in local machine?

I am a beginner with spark, generally in some executions with a java.lang.OutOfMemoryError: Java heap space is raised:

java.lang.OutOfMemoryError: Java heap space at java.base/java.nio.HeapByteBuffer.<init>(

I have looked for what could be due to the lack of --driver-memory and --executor-memory args, this spark is hosted in a docker container, and the pyspark script runs with spark-submit:

docker exec -it pyspark_container \
    /usr/local/lib/python3.10/dist-packages/pyspark/bin/spark-submit \

My computer specifications:

8 cores 30GB ram 1.2TB ssd

I would like to know if it is possible and if it makes sense to increase these args since I am not in a cluster, and how to do the allocation calculation.

Really appreciate your help


  • Since you're not using a --master argument in your spark-submit command, you're using Spark in local mode. That means that all the driver and executor processes happen on the same machine.

    In that case, the --executor-memory argument is not used. It is the --driver-memory argument that enable your local cluster to have more memory.

    As we don't know what your data looks like, it is a bit hard to say what a proper size would be to choose here. The default value of --driver-memory is 1g. Since you have 30G of RAM on your machine, you can increase this.

    Try something like:

    docker exec -it pyspark_container \
        /usr/local/lib/python3.10/dist-packages/pyspark/bin/spark-submit \
        --driver-memory Xg \

    where X is a number that makes sense for your data. If you have the whole machine to yourself, you can try using a big chunk of the available memory, like --driver-memory 25g.