databasedatabase-designfileserverfile-organization

Storing Link to File in a Database


I am creating a database application that (among many other things) allows users to upload and download files. The files are stored on a file server and I have set up an Apache HTTP server with PHP scripts to process (i.e. upload and download) the files. The database only stores a link to the file and not the file itself. My question is this: How should I organize the files on my file server?

Currently, I am creating a directory structure based on the current date and I rename the files with an MD5 hash of the current date/time (including milliseconds) plus some random characters (i.e. I'm adding "salt"):

\\yyyy\mm\dd\debb40da158040e4f3b93f9576840c07

This (above) is the link that is stored in the database (of course, I also store the real file name in the database so that I can rename the file when the user goes to download it---the user never sees the actual link).

I use yyyy\mm\dd for the directory structure to avoid performance issues (I'm told that a lot of files in the same directory can slow things down) and I rename the files with a unique string to avoid clashes when users upload files with the same name.

I'd like to get other opinions on the best way to deal with storing files in this kind of situation. I've seen some developers keep the file name, but append (as a prefix) the database ID of the corresponding row in the file information table---I see some advantages to this approach since the file names are "human readable" and you can figure out what the files are if the database file information table ever got corrupted or deleted.


Solution

  • How about having a structure utilising timestamp(upload date) as the 1st level directory, md5 hash of the file contents as the 2nd level(hash of file contents ensures the file is unique/name independent), upload timestamp as the 3rd(enables you to have different versions of the same file uploaded at different times), and the file with it's actual filename in the 4th level. e.g. <date timestamp>/<md5 of file contents>/<timestamp>/<filename>

    This way your dir structure will have information on:

    The handicap with the file contents md5 hash is that if you have significantly large files there's going to be a slight overhead in generating.

    FURTHER IDEAS