hashurl-shortenergoo.gl

How Does The Google URL Shortener Generate A 5 Digit Hash Without Collisions


How can the Google URL shortener generate a unique hash with five characters without collisions. Seems like there are bound to be collisions, where different urls generate the same hash.

stackoverflow.com => http://goo.gl/LQysz

What's also interesting, is the same URL, generates a completely different hash each time:

stackoverflow.com => http://goo.gl/Dl7sz

So, doing some math, using lower-case characters, upper-case characters, and digits, the total number of combinations are 62^5 = 916,132,832 clearly collisions bound to happen.

How does Google do this?


Solution

  • They have a database which tracks all previously generated URLs and the longer URL that each of those maps to. Easy to make sure that newly generated URLs don't already exist in that table. A little tricky to scale out (they surely have multiple servers so each one needs to be assigned a bucket of values from which it can give out to users). If they ever reach the point of having generated 916,132,832 URLs, they'll just add another character.