sqlsql-serversql-server-2005t-sqlbase62

Multi-base conversion - using all combinations for URL shortener


I am making an URL shortener, and I am struggling with the optimal way of encoding a number (id) into a character string.

I am using the characters 0-9,A-Z,a-z so it will basically be a base-62 encoding. That is pretty basic, but it doesn't make use of all possible codes. The codes that it would produce would be:

0, 1, ... y, z, 10, 11, ... zy, zz, 100, 101, ...

Notice that the codes 00 to 0z is not used, the same for 000 to 0zz, and so on. I would like to use all the codes, like this:

0, 1, ... y, z, 00, 01, ... zy, zz, 000, 001, ...

It would be some combination of base-62 and base-63, with different bases depending on the position... Using base-62 is easy, for example:

create procedure tiny_GetCode
    @UrlId int
as
set nocount on

declare @Code varchar(10)
set @Code = ''

while (@UrlId > 0 or len(@Code) = 0) begin
    set @Code = substring('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz', @UrlId % 62 + 1, 1) + @Code
    set @UrlId = @UrlId / 62
end

select @Code

But I haven't yet managed to make a multi-base conversion out of it, to make use of all the codes.


Solution

  • I managed to make the conversion. The tricky thing is that it's not just a mixed base conversion, the higher base of the first character also affects the values of longer codes.

    I started with an easier case; base-10 codes. I saw that the two digit range has 10 extra codes, the three digit range has 100 extra codes, and so on:

    0 - 9        : '0' - '9'
    10 - 109     : '00' - '99'
    110 - 1109   : '000' - '999'
    1110 - 11109 : '0000' - '9999'
    

    So, the value of the first character in the code is not just the base raised to the position, but it also has an offset.

    After applying this to the base-62 encoding, this is what I ended up with:

    create function tiny_Encode(@UrlId int) returns varchar(10)
    as
    begin
    
      declare
        @Chars varchar(62),
        @Code varchar(10),
        @Value int,
        @Adder int
    
      set @Chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
      if (@UrlId < 63) begin
        set @Code = substring(@Chars, @UrlId, 1)
      end else begin
        set @UrlId = @UrlId - 1
        set @Value = 62
        set @Adder = 0
        while (@UrlId >= @Value * 63 + @Adder) begin
          set @Adder = @Adder + @Value
          set @Value = @Value * 62
        end
        set @Code = substring(@Chars, (@UrlId - @Adder) / @Value, 1)
        set @UrlId = ((@UrlId - @Adder) % @Value)
        while (@Value > 1) begin
          set @Value = @Value / 62
          set @Code = @Code + substring(@Chars, @UrlId / @Value + 1, 1)
          set @UrlId = @UrlId % @Value
        end
      end
      return @Code
    
    end