I'm providing the comma separated filenames to the FileInputFormat in MapReduce Job. My total size of the data is 30Gb compressed snappy orc files.
When my map reduce job is starting immediately after 30secs it is failing with the OOM Error
2024-07-31 00:59:02,572 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuffer.append(StringBuffer.java:270) at org.apache.xerces.dom.DeferredDocumentImpl.getNodeValueString(Unknown Source) at org.apache.xerces.dom.DeferredDocumentImpl.getNodeValueString(Unknown Source) at org.apache.xerces.dom.DeferredTextImpl.synchronizeData(Unknown Source) at org.apache.xerces.dom.CharacterDataImpl.getData(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2775) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2663) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2559) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1340) at org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.initialize(MRWebAppUtil.java:51) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1498)
Does Map Reduce Job tries to load complete input data into memory and run or it executes file by file ??
I've tried below parameters but dint help
mapreduce.reduce.memory.mb=15360
mapreduce.map.memory.mb=10240
mapreduce.reduce.java.opts=-Xms14g -Xmx14g -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc
mapreduce.map.java.opts=-Xms9g -Xmx9g -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc
yarn.app.mapreduce.am.resource.mb=81920
yarn.app.mapreduce.am.command-opts=-Xms77g -Xmx77g -Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc
I'm using hadoop version: Hadoop 2.6.0-cdh5.16.1
Resolved after adding command-opts parameter
yarn.app.mapreduce.am.resource.mb=6144
yarn.app.mapreduce.am.command-opts=-Xms3g -Xmx5g