resthadoophadoop-yarnmapr

Query node-label topology from Yarn via REST API [MapR 6.1/Hadoop-2.7]


There is a Java and CLI-interface to query Yarn RM for node-to-nodelabel (and inverse) mappings. Is there a way to do this via the REST-API as well? An initial RM-API search revealed only node-label based job submissions as an option.

Sadly that is actually broken in MapR-Hadoop (6.1 as of 6/6/19), so my code has to work around that, by implementing the correct scheduling itself. This works (barely - more broken APIs here as well) using the YarnClient Java API.

But as I want to schedule jobs against different resource managers at the same time, behind firewalls, the REST-API is the most compelling option to achieve this, and the YarnClient API's RPC backend can't be easily transported.

My current worst-case solution would be to parse the YARN-WebUI in some way.


Solution

  • The only solution I found so far: Request /ws/v1/cluster/nodes - this gets you all nodes.

    FlatMap/Distinct on each node's nodeLabels, if you need just the list of node labels. Filter by nodeLabel, if you need all nodes for a specified label.

    This does mean, that you always have to query all nodes, then sort/filter/arrange by NodeLabels, which is a lot of client-side magic. But apparently there's no GetNodesToLabel or even GetClusterNodeLabels to help us out.

    I assume getLabelsToNodes is just a client-side implementation, as the protocol doesn't define the API, so that's right out the window for REST, unless implemented in the WebService.