I have a question regarding elastic mapreduce on amazon web services. Has any of you been able to set the following environment parameter:
mapreduce.map/reduce.java.opts
The problem is when I check the heap size in the virtual machine of both mappers and reducers the maximum heap size is not affected by setting these. I check the heapsize as by adding the following lines to my map/reduce code:
runtime = Runtime.getRuntime();
System.out.println(runtime.maxMemory());
I am setting them using the command line interface with the following parameters:
-bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args "-m,mapreduce.map.java.opts=-Xmx1000m,-m,mapreduce.reduce.java.opts=-Xmx3000m"
I checked the hadoop version on Amazon EMR is the following: 1.0.3 (I checked the reference book by Tom White and it says these should be supported starting hadoop 0.21.)
It is possible though to set the VM settings of a child process (=same for both mapper and reducer -> mapred.child.java.opts) but this is very inconvenient for my algorithm in which the reducer has to store a large hashmap while the mapper doesn't story anything.
Maybe related to this question: is it possible to get a warning if you set unsupported environment variables. When I set the above variable they can be accessed but they are not used/supported apparently. (configuration.get(...)
returns the values I set)
If you look in the hadoop-1.0.3/docs
folder, you will find a file named mapred_tutorial.html.
In the "Task Execution & Environment" section, the document tells you to use the following:
mapred.{map|reduce}.child.java.opts
They have changed the configuration name, so the mapreduce.map/reduce.java.opts
from hadoop-0.21.0 does not work anymore on the newer hadoop-1.0.3.