we have the follwing hadoop cluster versions , ( DATA-NODE machine are on Linux OS version - 7.2 )
ambari - 2.6.1 HDP - 2.6.4
we saw few scenarios that disks on datanode machine became full 100%
and that because the files as - stdout are huge size
for example
/grid/sdb/hadoop/yarn/log/application_151746342014_5807/container_e37_151003535122014_5807_03_000001/stdout
from df -h , we can see
df -h /grid/sdb
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 1.8T 1.8T 0T 100% /grid/sdb
any suggestion how to avoid this situation that stdout are huge and actually this issue cause stopping the HDFS component on the datanode,
second: since the PATH of stdout is:
/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout
is it possible to limit the file size? or do a purging of stdout when file reached the threshold ?
Looking at the above path looks like your application (Hadoop Job) is writing a lot of data to stdout
file. This generally happens when the Job writes data
to stdout
using System.out.println
function or similar which is not required but sometimes can be used to debug code.
Please check your application code and make sure that it does not write to stdout
.
Hope this helps.