Can anyone please tell me that If I am using java application to request some file upload/download operations to HDFS with Namenode HA setup, Where this request go first? I mean how would the client know that which namenode is active?
It would be great if you provide some workflow type diagram or something that explains request steps in detail(start to end).
If hadoop cluster is configured with HA, then it will have namenode IDs in hdfs-site.xml like this :
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>namenode1,namenode2</value>
</property>
Whichever NameNode is started first will become active. You may choose to start the cluster in a specific order such that your preferred node starts first.
If you want to determine the current status of namenode, you can use getServiceStatus() command :
hdfs haadmin -getServiceState <machine-name>
Well, while writing the driver class, you need to set the following properties in configuration object:
public static void main(String[] args) throws Exception {
if (args.length != 2){
System.out.println("Usage: pgm <hdfs:///path/to/copy> </local/path/to/copy/from>");
System.exit(1);
}
Configuration conf = new Configuration(false);
conf.set("fs.defaultFS", "hdfs://nameservice1");
conf.set("fs.default.name", conf.get("fs.defaultFS"));
conf.set("dfs.nameservices","nameservice1");
conf.set("dfs.ha.namenodes.nameservice1", "namenode1,namenode2");
conf.set("dfs.namenode.rpc-address.nameservice1.namenode1","hadoopnamenode01:8020");
conf.set("dfs.namenode.rpc-address.nameservice1.namenode2", "hadoopnamenode02:8020");
conf.set("dfs.client.failover.proxy.provider.nameservice1","org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
FileSystem fs = FileSystem.get(URI.create(args[0]), conf);
Path srcPath = new Path(args[1]);
Path dstPath = new Path(args[0]);
//in case the same file exists on remote location, it will be overwritten
fs.copyFromLocalFile(false, true, srcPath, dstPath);
}
Request will go to the nameservice1 and further handled by Hadoop cluster as per the namenode status(active/standby).
For more details, please refer the HDFS High availability