I am trying to widen my knowledge with respect to distributed systems and systems design. I came across the terms such as Content Delivery Network and Distributed File Systems for storing/handling media data such as music, videos, pictures, gifs, etc..
I have understood that DFS is just a file system that we have in a laptop which is replicated to other servers(distributed) to make it 99.9% available. CDN is a network that can store assets including javascript, images, videos etc..
Now, I have a big confusion in understanding the difference between the both and which one will work better at different use cases ? Please correct me if my basic understanding of CDN and DFS itself is wrong. Thanks in advance!
They each serve their own purpose.
DFS:
A distributed file system (DFS) as the name mentions has the file system (parts of file) distributed across multiple servers though it appears as a local file system (local file). That is, it enables the clients to access and process data stored on the server as if it is in local system. DFS relies on metadata and enables transparency , replication of file directory and has mechanism for tolerance to faults , brings in improved performance by caching of recently accessed disk blocks and scalability.
Key DFS architectures are as below :
• Client-Server Architecture - This shall be based on many servers that manage, metadata and data between multiple clients is managed by providing a global namespace of the system.
• Cluster-Based Architecture - This system has metadata and data decoupled such that some of the servers store data and some are dedicated to manage metadata. A cluster based architecture system shall in-turn be termed as centralized system if it has only only one metadata server whereas it shall be in-turn termed as a distributed system if it has distributed metadata servers.
For example HDFS (Hadoop DFS) is type of DFS and it falls in category of centralized distributed file system as there is single server termed as namenode that manages the Metadata and the datanodes shall hold the split data , distributed and replicated.
In the case of DFS, it enables efficiency while numerous people of a team / group or various apps / process are working on a huge file at same instance, and smoothens the daily activity of a user / high level app as the user / app does not need to know the exact file path or backup of it. It enables multiple users to utilize the multiple machines/hosts connected by a network based on a file system that is efficient, secure and robust.
CDN:
CDN uses service nodes deployed at various internet backbones that enable distributing the service spatially relative to end users and thereby increasing performance and ensuring high availability. The algorithm chooses CDN nodes or Edge servers that that are best for serving content (based on hop distance or load) to the user for a particular request. There are many techniques deployed like reactive probing, proactive probing, and connection monitoring for determining the proximity of CDN node/service node.
In the case of CDN, the CDN operator are paid by many content providers to deliver their content to their end users by having the service nodes / servers placed accordingly. The CDN operator in turn work with ISP for hosting its servers in their data centers in appropriate locations of interest by paying to the ISPs.