we have Hadoop cluster with only 2 data nodes machines
in HDFS
configuration we defined the Block replication
to 3
so
Block replication=3
is it OK? to defined Block replication=3
, when we have only two data nodes in the cluster
from my understanding when we defined Block replication=3
while we have 2 data nodes machines in HDFS
cluster its means that one machine should have 2 replica and the other machine one replica , am I correct here?
The whole purpose of replication factor is fault tolerance. For example replication factor is 3 and if we lose hadoop datanode from cluster we can have the data replicated with 2 more copies in cluster. So in your case if datanodes are 2 in numbers and if replication factor is 3, yes if node-a will have 2 copies and the other node-b has 1 copy(say). If we lose a node-a or node-b, here we will have the data available in other node to serve the purpose anyways. Except the fact that the node-a will occupy double space which is unnecessary since replication factor 2 itself will already satisfy the fault tolerance purpose.
Again this whole explanation is specific to your case. And the whole concept will make better sense when it is visualized in a cluster with more than 2 nodes.
Below is detailed explanation from hadoop docs https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Data+Replication