javabit-fields

Bit masking unsigned and signed values


In the example provided here,

We created a 64 bit ID that contains the shard ID = 16 bit, the type of the containing data = 10 bit, and where this data is in the table (local ID) = 36.

The savvy additionology experts out there will notice that only adds to 62 bits. My past in compiler and chip design has taught me that reserve bits are worth their weight in gold. So we have two (set to zero).

So does this mean:

Question One:

They can have 2^16, range of 0-65536 shards ?

Type ID of 2^10, range of 0-1024 types ?

Local ID of 2^36, range of 0-68719476736 local id's ?

Also I am trying to replicate their hashing function in Java

| represents separation of two sets of 32 bits to make it easier to visualize.

# represents a bit shift of 46, enclosing 18 bit of which 2 are reserved - ShardId

~ represents a bit shift of 36, enclosing 10 bit - Type Id The remaining 36 bits - local ID:

#0000 0000 0000 0000 00#~00 0000 0000~ 0000 | 0000 0000 0000 0000 0000 0000 0000 0000 |


  1. Binary of ShardID 3429 = 1101 0110 0101
  2. Therefore (hashedValue >> 46) = 00 0000 1101 0110 0101 &
  3. 0xFFFF = 1111 1111 1111 1111
  4. ShardId = 00 0000 1101 0110 0101

Question Two:

I understand need for 1. and 2. however I dont see why we need a bitwise operator & 0xFFFF since 4 & 2 are effectively the same.

Question Three:

I getting the following complier error: The literal 0xFFFFFFFFF of type int is out of range

public class BitExampleTest {
    public static void main(String[] args) {
      long pinId = 241294492511762325L;
      unHash(pinId);
    }

    private static long hash(int shardId, int typeId, int localId){
        return (shardId << 46) | (typeId << 36) | (localId << 0);
    }

    private static void unHash(long hashedValue){
               long shardID = (hashedValue >> 46) & 0xFFFF; 
               long typeID  = (hashedValue >> 36) & 0x3FF;
               long localID = (hashedValue >>  0) & 0xFFFFFFFFF;

               System.out.printf("shardID %s \n",shardID);
               System.out.printf("typeID %s \n",typeID);
               System.out.printf("localID %s \n",localID);
    }
}

Solution

  • They can have 2^16, range of 0-65536 shards?

    Shard ID is 16 bits. So, 216 different Shard ID's are possible.

    Type ID of 2^10, range of 0-1024 types?

    Type ID is 10 bits. So, 210 different Type ID's are possible.

    Local ID of 2^36, range of 0-68719476736 local id's?

    Local ID is 36 bits. So, 236 different Local ID's are possible, i.e., pointers to 236 locations in the table are possible.

    Now, referring to the Pinterest post, the Pin ID used to demonstrate is 241294492511762325.

    At first glance, it is clear that this value won't fit in a Java int datatype. So, we switch over to long.

    // 'L' added to tell the compiler it is a long, not an int
    System.out.println(Long.toBinaryString(241294492511762325L)); 
    
    // output (padded with 0's on the left)
    0000 0011 0101 1001 0100 0000 0001 0000 0000 0000 0110 1011 1111 0111 1001 0101
    

    Initially, ID of 241294492511762325 looks like this...

    xx 00 0011 0101 1001 01 00 0000 0001 0000 0000 0000 0110 1011 1111 0111 1001 0101
    XX [____SHARD(16)_____] [_TYPE(10)_] [________________LOCAL(36)_________________]
    

    To get the Shard bits, right shifting ID by (10 + 36) 46 would work. This would get us. Please note that the bits to the left of XX can be either 0's or 1's, depending on sign extension of the 'final two bits', etc.

    xx 00 0011 0101 1001 01
    XX [____SHARD(16)_____]
    

    Taking the bit-wise AND of this with 0xffff

               'our 2 golden bits'
                       ▼▼
      xxxx xxxx xxxx xxxx 0000 1101 0110 0101
    & 0000 0000 0000 0000 1111 1111 1111 1111
    = 0000 0000 0000 0000 0000 1101 0110 0101
    

    Regardless of what the leading bits were set to, now they are all 0's. I think that should make it clear to you, the reason behind the bitwise AND with 0xffff. If they are left-padded with 0's, great. If they aren't the AND takes care of it. :)

    When you initialize a literal like 0xFFFFFFFFF, if there is no suffix and the variable is an integral type(int, long, etc), the value is assumed to be an int. And, an int can hold 32 bits, not 36 bits (9 x 0xF = 9 x '1111') like you are trying. So, you have to use a long that has a capacity of 64 bits. Appending 'L' or 'l' to the end of the value, like 0xFFFFFFFFFL, should take care of the compiler error. [Reference]