amazon-s3cloudcdncontent-delivery-networkcloud-storage

Why randomize your file names for cloud storage/CDN?


When you look at a profile picture on a social networking site like Twitter, they store image files like:

http://a1.twimg.com/profile_images/1082228637/a-smile_twitter_100.jpg

or even with a date somewhere in the path like 20110912. The only immediate benefit I can think of is preventing a bot from going through and downloading all files in your storage in a linear fashion. Am I missing any other benefits? What is the best way to go about randomizing it?

I am using Amazon S3 so I will have one subdomain serving all my static content. My plan was to store an integer ID in my database and then just concat the URL with the id to form the location.


Solution

  • One reason I cryptographically scramble identifiers in public URLs is so that the business' rate of growth is not always public.

    If the current ids can be deduced simply by creating a new user account or uploading an image, then an outside person can calculate the growth rate (or an upper limit) by doing this on a regular basis and seeing how many ids were used during the elapsed time.

    Whether it's stagnating or whether it's exploding exponentially, I want to be able to control the release of this information instead of letting competitors or business analysts be able to deduce it for themselves.

    Offline examples of this are invoice and check numbers. If you get billed by or paid by a company on a regular basis, then you can see how many invoices or checks they write in that time period.

    Here's a CPAN (Perl) module I maintain that scrambles 32-bit ids using two way encryption based on SkipJack:

    http://metacpan.org/pod/Crypt::Skip32

    It's a direct translation of the Skip32 algorithm written in C by Greg Rose:

    http://www.qualcomm.com.au/PublicationsDocs/skip32.c

    Use of this approach maps each 32-bit id into an (effectively random) corresponding 32-bit number which can be reversed back into the original id. You don't have to save anything extra in your database.

    I convert the scrambled id into 8 hex digits for displaying in URLs.

    Once your ids approach 4.29 billion (32-bits) you'll need to plan for extending the URL structure to support more, but I like having shorter URLs for as long as possible.