I want to know the communication protocol specifically port number
used by Namenode and Datanode in hadoop.
Say, if I write the following command in Namenode,
hdfs dfsadmin -report
it will show the details of live nodes (namenode & datanode), how many datanodes are there etc. My question is how namenode and datanode communicates ? via which port
? I am actually getting only 1 datanode with the above command whereas in my cluster, there are 8 datanodes. So, I am not sure whether any port blocking of networking
is caused this!! My firewall is disabled in the namenode and all the datanodes. I have checked this via sudo ufw status
command which returned inactive
.
From hadoop official pages (link), I have found this:
The Communication Protocols
All HDFS communication protocols are layered on top of the TCP/IP protocol. A client establishes a connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to the NameNode using the DataNode Protocol. A Remote Procedure Call (RPC) abstraction wraps both the Client Protocol and the DataNode Protocol. By design, the NameNode never initiates any RPCs. Instead, it only responds to RPC requests issued by DataNodes or clients.
I am using hadoop 3.1.1
in Ubuntu 16.04
Any help is highly appreciated. Thanks.
These are all configured in hdfs-site.xml
.
For example, by default, dfs.datanode.address=0.0.0.0:9866
If you search for port
or address
, then you can generally find what you are looking for https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
If that command or the NameNode UI don't show datanodes, then SSH to the individual nodes, check jps
to see if process is running, and log files to find if the process is not running.