we have Hadoop cluster ( HDP 2.6.5
cluster with ambari , with 25 datanodes machines )
we are using spark streaming application (spark 2.1
run over Hortonworks 2.6.x
)
the current situation is that spark streaming applications runs on all datanodes machines
but now we want the spark streaming application to run only on the first 10 datanodes
machines
so the others last 15 datanodes
machines will be restricted , and spark application will runs only on the first 10 datanodes
machines
is this scenario can be done by ambary features or other approach?
for example we found the - https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/configuring_node_labels.html ,
and
http://crazyadmins.com/configure-node-labels-on-yarn/
but not sure if Node Labes can help us
@Jessica Yes, you are absolutely onto the right path. Yarn Node Labels and Yarn Queues are how Ambari Administrators control team level access to portions of the entire yarn cluster. You can start very basic with just a non default queues or get very in-depth with many queues for many different teams. Node labels take it to another level, allow you to map queues and teams to nodes specifically.
Here is a post with the syntax for spark to use the yarn queue:
How to choose the queue for Spark job using spark-submit?
I tried to find 2.6 version of these docs, but was not able.... they have really mixed up the docs since the merger...
The actual steps you may have to take may be a combination of items from both. Typical experience for me when working in Ambari HDP/HDF.