hadoophadoop2hadoop-streaminghadoop-plugins

What is stateless node? How Hadoop nodes are stateless?


Does stateless node mean just being independent of each others? can you explain this concept w.r.t to hadoop


Solution

  • The explanation can be as follows: each mapper/reducer has no idea about all the other mappers/reducers (i.e. about their current states, their particular outputs if any, etc.). Such statelessness is not great for certain data processing workloads (e.g. graph data) but allows easy parallelization (a particular map/reduce task can be run on any node, meaning a failed mapper/reducer is not an issue, just start a new one on the same input split/mappers' outputs).