I'm trying to profile the memory usage of my hadoop job.
Could someone provide a step by step how-to on how to monitor hadoop tasks with yourkit - including setup?
All you have to do is add the following entry to your mapred-site.xml file(which if found in $HADOOP_HOME/conf/, where $HADOOP_HOME is your Hadoop installation directory):
<property>
<name>mapred.child.java.opts</name>
<value>
-agentpath:{yourkit installation directory}/bin/linux-x86-64/libyjpagent.so=tracing,dir={output directory}
</value>
</property>
If you are running on a platform different from linux-x86-64, you might need to change above value to match your platform(see this for details)
You can pass any of the options listed here to the profiler agent
This will create a number of snapshots, one for each Child process in the specified output directory