hadoophigh-availabilityhadoop-yarn

Any command to get active namenode for nameservice in hadoop?


The command:

hdfs haadmin -getServiceState machine-98

Works only if you know the machine name. Is there any command like:

hdfs haadmin -getServiceState <nameservice>

which can tell you the IP/hostname of the active namenode?


Solution

  • To print out the namenodes use this command:

    hdfs getconf -namenodes
    

    To print out the secondary namenodes:

    hdfs getconf -secondaryNameNodes
    

    To print out the backup namenodes:

    hdfs getconf -backupNodes
    

    Note: These commands were tested using Hadoop 2.4.0.

    Update 10-31-2014:

    Here is a python script that will read the NameNodes involved in Hadoop HA from the config file and determine which of them is active by using the hdfs haadmin command. This script is not fully tested as I do not have HA configured. Only tested the parsing using a sample file based on the Hadoop HA Documentation. Feel free to use and modify as needed.

    #!/usr/bin/env python
    # coding: UTF-8
    import xml.etree.ElementTree as ET
    import subprocess as SP
    if __name__ == "__main__":
        hdfsSiteConfigFile = "/etc/hadoop/conf/hdfs-site.xml"
    
        tree = ET.parse(hdfsSiteConfigFile)
        root = tree.getroot()
        hasHadoopHAElement = False
        activeNameNode = None
        for property in root:
            if "dfs.ha.namenodes" in property.find("name").text:
                hasHadoopHAElement = True
                nameserviceId = property.find("name").text[len("dfs.ha.namenodes")+1:]
                nameNodes = property.find("value").text.split(",")
                for node in nameNodes:
                    #get the namenode machine address then check if it is active node
                    for n in root:
                        prefix = "dfs.namenode.rpc-address." + nameserviceId + "."
                        elementText = n.find("name").text
                        if prefix in elementText:
                            nodeAddress = n.find("value").text.split(":")[0]                
    
                            args = ["hdfs haadmin -getServiceState " + node]  
                            p = SP.Popen(args, shell=True, stdout=SP.PIPE, stderr=SP.PIPE)
    
                            for line in p.stdout.readlines():
                                if "active" in line.lower():
                                    print "Active NameNode: " + node
                                    break;
                            for err in p.stderr.readlines():
                                print "Error executing Hadoop HA command: ",err
                break            
        if not hasHadoopHAElement:
            print "Hadoop High-Availability configuration not found!"