I am using Cloudera Hadoop (CDH 5.16.2
) for testing purpose. I ran the following map-reduce application two days ago:
yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
wordcount \
-Dmapreduce.job.reduces=8 \
/user/bigdata/randomtext \
/user/bigdata/wordcount
Whenever i start the cluster and check the scheduler, it shows that there are submitted applications. I already tried the following command to kill them and the command output shows that it has killed all applications but later all of them again start showing up.
for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done
Here's the content of fair-scheduler.xml
:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<allocations>
<queue name="root">
<schedulingPolicy>drf</schedulingPolicy>
<queue name="default">
<schedulingPolicy>drf</schedulingPolicy>
</queue>
</queue>
<queuePlacementPolicy>
<rule name="specified" create="false"/>
<rule name="default" create="true"/>
</queuePlacementPolicy>
</allocations>
Just wanted to understand what's going on and how can i kill them as it's just a test cluster.
In my case, I finally figured out that my cluster was actually attacked. It happened because the Azure Network Security Group (NSG) was not configured properly. This also resulted in high-bandwidth charges (data transfer out) though I got that waived-off after requesting Azure team. After I restricted both the inbound and outbound traffic, everything got sorted. I killed the applications that were in queue and then they never appeared again.
I was checking online and it seems Hadoop YARN-based remote code execution (RCE) are actually quite common. So kindly make sure your NSG is configured properly.
Ref: