algorithmmathhilbert-curve

Implementing a Hilbert map of the Internet


In the XKCD comic 195 a design for a map of the Internet address space is suggested using a Hilbert curve so that items from a similar IP adresses will be clustered together.

Given an IP address, how would I calculate its 2D coordinates (in the range zero to one) on such a map?


Solution

  • This is pretty easy, since the Hilbert curve is a fractal, that is, it is recursive. It works by bisecting each square horizontally and vertically, dividing it into four pieces. So you take two bits of the IP address at a time, starting from the left, and use those to determine the quadrant, then continue, using the next two bits, with that quadrant instead of the whole square, and so on until you have exhausted all the bits in the address.

    The basic shape of the curve in each square is horseshoe-like:

     0 3
     1 2
    

    where the numbers correspond to the top two bits and therefore determine the traversal order. In the xkcd map, this square is the traversal order at the highest level. Possibly rotated and/or reflected, this shape is present at each 2x2 square.

    Determination of how the "horseshoe" is oriented in each of the subsquares is determined by one rule: the 0 corner of the 0 square is in the corner of the larger square. Thus, the subsquare corresponding to 0 above must be traversed in the order

     0 1
     3 2
    

    and, looking at the whole previous square and showing four bits, we get the following shape for the next division of the square:

     00 01 32 33
     03 02 31 30
     10 13 20 23
     11 12 21 22
    

    This is how the square always gets divided at the next level. Now, to continue, just focus on the latter two bits, orient this more detailed shape according to how the horseshoe shape of those bits is oriented, and continue with a similar division.

    To determine the actual coordinates, each two bits determine one bit of binary precision in the real number coordinates. So, on the first level, the first bit after the binary point (assuming coordinates in the [0,1] range) in the x coordinate is 0 if the first two bits of the address have the value 0 or 1, and 1 otherwise. Similarly, the first bit in the y coordinate is 0 if the first two bits have the value 1 or 2. To determine whether to add a 0 or 1 bit to the coordinates, you need to check the orientation of the horseshoe at that level.

    EDIT: I started working out the algorithm and it turns out that it's not that hard after all, so here's some pseudo-C. It's pseudo because I use a b suffix for binary constants and treat integers as arrays of bits, but changing it to proper C shouldn't be too hard.

    In the code, pos is a 3-bit integer for the orientation. The first two bits are the x and y coordinates of 0 in the square and the third bit indicates whether 1 has the same x coordinate as 0. The initial value of pos is 011b, meaning that the coordinates of 0 are (0, 1) and 1 has the same x coordinate as 0. ad is the address, treated as an n-element array of 2-bit integers, and starting from the most significant bits.

    double x = 0.0, y = 0.0;
    double xinc, yinc;
    pos = 011b;
    for (int i = 0; i < n; i++) {
        switch (ad[i]) {
            case 0: xinc = pos[0]; yinc = pos[1]; pos[2] = ~pos[2]; break;
            case 1: xinc = pos[0] ^ ~pos[2]; yinc = pos[1] ^ pos[2]; break;
            case 2: xinc = ~pos[0]; yinc = ~pos[1]; break;
            case 3: xinc = pos[0] ^ pos[2]; yinc = pos[1] ^ ~pos[2];
                pos = ~pos; break;
        }
        x += xinc / (1 << (i+1)); y += yinc / (1 << (i+1));
    }
    

    I tested it with a couple of 8-bit prefixes and it placed them correctly according to the xkcd map, so I'm somewhat confident the code is correct.