There is a Java and CLI-interface to query Yarn RM for node-to-nodelabel (and inverse) mappings. Is there a way to do this via the REST-API as well? An initial RM-API search revealed only node-label based job submissions as an option.
Sadly that is actually broken in MapR-Hadoop (6.1 as of 6/6/19), so my code has to work around that, by implementing the correct scheduling itself. This works (barely - more broken APIs here as well) using the YarnClient Java API.
But as I want to schedule jobs against different resource managers at the same time, behind firewalls, the REST-API is the most compelling option to achieve this, and the YarnClient API's RPC backend can't be easily transported.
My current worst-case solution would be to parse the YARN-WebUI in some way.
The only solution I found so far:
Request /ws/v1/cluster/nodes
- this gets you all nodes.
FlatMap/Distinct on each node
's nodeLabels
, if you need just the list of node labels. Filter by nodeLabel, if you need all nodes for a specified label.
This does mean, that you always have to query all nodes, then sort/filter/arrange by NodeLabels
, which is a lot of client-side magic. But apparently there's no GetNodesToLabel
or even GetClusterNodeLabels
to help us out.
I assume getLabelsToNodes
is just a client-side implementation, as the protocol doesn't define the API, so that's right out the window for REST, unless implemented in the WebService.