hadoophadoop-yarndatanode

Datanode disks are full because huge files as stdout


we have the follwing hadoop cluster versions , ( DATA-NODE machine are on Linux OS version - 7.2 )

ambari - 2.6.1 HDP - 2.6.4

we saw few scenarios that disks on datanode machine became full 100%

and that because the files as - stdout are huge size

for example

/grid/sdb/hadoop/yarn/log/application_151746342014_5807/container_e37_151003535122014_5807_03_000001/stdout

from df -h , we can see

df -h /grid/sdb
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        1.8T  1.8T  0T   100% /grid/sdb

any suggestion how to avoid this situation that stdout are huge and actually this issue cause stopping the HDFS component on the datanode,

second: since the PATH of stdout is:

/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout

is it possible to limit the file size? or do a purging of stdout when file reached the threshold ?


Solution

  • Looking at the above path looks like your application (Hadoop Job) is writing a lot of data to stdout file. This generally happens when the Job writes data to stdout using System.out.println function or similar which is not required but sometimes can be used to debug code.

    Please check your application code and make sure that it does not write to stdout.

    Hope this helps.