hadoopfederationmulti-master-replication

synchronization issues about hadoop federation


I have some questions about hadoop federation. As far as I know, it has multiple masters(namenode) running at same time.

So my question is that if a client has a request, how to determine which master to serve the request from client.

Another question is that whether the metadata stored in every master is concurrent with each other or not.

If the data in masters is concurrent, while two clients have requests at same time at two different master, how to deal with the synchronization issues.

Hope I make my question clear. I only read web on apache hadoop. Any material and tutorial are very grateful. And comment and correction are very appreciated.


Solution

  • Using client side mount tables we can map file paths to namenodes (core-site.xml configuration below)

      <property>
            <name>fs.viewfs.mounttable.default.link./namenode1</name>
            <value>hdfs://namenode1:9001/home</value>
        </property>
        <property>
            <name>fs.viewfs.mounttable.default.link./namenode2</name>
            <value>hdfs://namenode2:9001/home</value>
        </property>}
    

    example during put operation we can specify path and request will go to namenode1

    bin/hadoop fs -put file.txt /namenode1/input
    

    In HDFS Federation each namenode manages its own metadata .