hadoophadoop-streaminghadoop-pluginshadoop-partitioning

New user SSH hadoop


Installation of hadoop on single node cluster , any idea why do we need to create the following

  1. Why do we need SSH access for a new user ..?

  2. Why should it be able to connect to its own user account?

  3. Why should i specify a password less for a new user ..?

  4. When all the nodes are in same machine, why do they are communicating explicitly ..?

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/


Solution

  • Why do we need SSH access for a new user ..?

    Because you want to communicate to the user who is running Hadoop daemons. Notice that ssh is actually from a user(on one machine) to another user(on a another machine), and not just machine to machine.

    Why should it be able to connect to its own user account?

    Because you want to start all the daemons by just one command. Otherwise you have to start the daemons individually, by issuing commands for each daemon. ssh is required for this, even if you are on a single machine.

    Why should i specify a password less for a new user ..?

    Because you don't want to enter the password everytime you start your Hadoop daemons. That would be irritating, right?

    When all the nodes are in same machine, why do they are communicating explicitly ..?

    What do you mean by explicitly? Remember, ssh is not for the communication between the processes. All the communication happens over TCP/IP. ssh is required by the Hadoop scripts so that you can start all the daemons from one machines without having to go at each machine and start each process separately over there.

    HTH