hadoopmapreducehadoop-yarn

Map Reduce Job Failing with OOM [org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster]


I'm providing the comma separated filenames to the FileInputFormat in MapReduce Job. My total size of the data is 30Gb compressed snappy orc files.

When my map reduce job is starting immediately after 30secs it is failing with the OOM Error

2024-07-31 00:59:02,572 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuffer.append(StringBuffer.java:270) at org.apache.xerces.dom.DeferredDocumentImpl.getNodeValueString(Unknown Source) at org.apache.xerces.dom.DeferredDocumentImpl.getNodeValueString(Unknown Source) at org.apache.xerces.dom.DeferredTextImpl.synchronizeData(Unknown Source) at org.apache.xerces.dom.CharacterDataImpl.getData(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2775) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2663) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2559) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1340) at org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.initialize(MRWebAppUtil.java:51) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1498)

Does Map Reduce Job tries to load complete input data into memory and run or it executes file by file ??

I've tried below parameters but dint help

mapreduce.reduce.memory.mb=15360

mapreduce.map.memory.mb=10240

mapreduce.reduce.java.opts=-Xms14g -Xmx14g -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc

mapreduce.map.java.opts=-Xms9g -Xmx9g -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc

yarn.app.mapreduce.am.resource.mb=81920

yarn.app.mapreduce.am.command-opts=-Xms77g -Xmx77g -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc

I'm using hadoop version: Hadoop 2.6.0-cdh5.16.1


Solution

  • Resolved after adding command-opts parameter

    yarn.app.mapreduce.am.resource.mb=6144

    yarn.app.mapreduce.am.command-opts=-Xms3g -Xmx5g