databasefile-uploadfilesystemsscalabilityweed-fs

How can I store 1 billion images on servers uploaded from a web application?


What is the best way to store 1 billion images? (uploaded by users of website via PHP or Javascript upload)

Since everyone knows storing tons of images (website users uploaded images in this case) are bad inside a single directory or NFS etc, what is the best way, architecture, configuration of the storage solution to store 1 billion images?

How will we organize the users images assuming a single user will not have more than 20 images? Please consider that this has to be organized in a structural way so we can fetch a single user's images via php/javascript or API programmatically through some type of user's unique identifier(s) or hash.

Any open source solution will be preferred. Possible solutions are glusterFS, MongoDB, WeedFS, etc.

Assume the following:

During my research, I also came up with the following 2 great articles, in case it helps you clarify my question further.

http://highscalability.com/flickr-architecture

http://perspectives.mvdirona.com/2008/06/30/FacebookNeedleInAHaystackEfficientStorageOfBillionsOfPhotos.aspx


Solution

  • For the storage part of the project, I would say that you would need something different than a usual file system mounted on dedicated or external disks (SATA, SAS or fiber/SSD).

    Glusterfs distributed file system, would be ideal for use a a storage engine, because it can support replicated configurations (for HA) and also distributed (and mixed) configuration to gain in IO speed.

    For the organization part of the project, I would think that you should have a main file system (mounted across all clients/web servers), and in this file system you should have separate directories for every user, with two subdirs (one for the high resolution and one for the small resolution pictures).

    Finally, the same storage servers can be used as web servers at the same time or we can use different servers (possibly virtual machines XEN, KVM or Vmware). The mounting of the gluster volume to the web servers, should be done with the use of fuse and glusterfs client module (from /etc/fstab). This is a must for the features of the glusterfs to work.