distributed-computingnfsnetwork-storage

How does NFS process requests for data?


When I used someone else's framework, I found that it would use NFS technology to share a specified folder before performing distributed computing.

For example, there are two parts 'part1' and 'part2' in this folder. Then if my machine 1 reads 'part1' and machine 2 reads 'part2', if machine 1 wants to get the content of 'part2', then it should make a request directly to machine 2, or directly read the local 'part2' file?

My understanding is that NFS can synchronize each machine under the corresponding folder, and the file will be stored in each machine, rather than a link to the corresponding location of a certain machine. I'm not sure if this understanding is correct.


Solution

  • NFS makes files available over a network. Using your example, if machine 1 and machine 2 are clients of the NFS server, they won't refer to each other when attempting to retrieve data. As such, when machine 1 wants 'part2', it will make the request to the NFS server rather than to machine 2 (despite the fact machine 2 has read 'part2').

    The reasoning for this is that the version of 'part2' that exists on the NFS server may have changed in the time between machine 2 reading 'part2', making machine 2's copy of 'part2' out of date. By making all requests to the NFS server, clients can ensure that they are getting the most recent version of a file at any given time.

    The behaviour you're describing is more akin to the behaviour of BitTorrent (https://en.wikipedia.org/wiki/BitTorrent). BitTorrent solves the out-of-date file problem by not allowing files to ever change and distributing hashes of the files. Knowing this, your torrent client can request parts of a folder or file from anyone in a 'swarm' and independently verify that the parts you received are correct.