c logical-operators primality-test relational-operators

Primality test using logical or relational operators on a 3-bit number

I am looking forward to check if a 3 bits number is prime using both logical and relational operators.The number is represented using 3 variables with bits 7-1 set to 0 and only the bit on position 0 being the actual data. Suppose we have:

unsigned char x3, x2, x1;

One can assume that a prime number is function f which outputs 1 if the number is prime, 0 otherwise.

How would one solve this using bit-wise operations (logical operators) as optimal as possible? One can assume that a minimum conjunctive/disjunctive form can be extracted from a K.V. diagram of the truth table.

How would one solve this using relational operators?

Which one would be faster?

Some useful data:

CDF: (~x2 & X1) | (X0 & X2)
CCF: (X1 | X2) & (X0 | ~X2)

Solution

Bitwise

I think the best you can do here is (x3 & x1) | (~x3 & x2). In boolean algebra, this would be expressed as AC + (!A)B.* None of the usual rules for simplifying boolean algebra expressions would seem to apply here, and several online boolean algebra expression simplifiers seem to agree.

* (the second A would normally be written with a bar over it, but I don't know how to do that in markdown).

So you'd get something like this (using uchar as shorthand for unsigned char):

uchar f_bitwise(uchar x3, uchar x2, uchar x1) 
{
   return (x3 & x1) | (~x3 & x2);
}

The assembly produced by this (with -O0 and discarding the function call overhead), looks like this:

movzx   eax, BYTE PTR [rbp-4]  # move x3 into register eax
and     al, BYTE PTR [rbp-12]  # bitwise AND the lower half of eax with x1
mov     ecx, eax               # store the result in ecx
cmp     BYTE PTR [rbp-4], 0    # compare x3 with 0
sete    al                     # set lower half of eax to 1 if x3 was equal to 0
mov     edx, eax               # store the result in edx (this now equals ~x3)
movzx   eax, BYTE PTR [rbp-8]  # move x2 into eax
and     eax, edx               # bitwise AND ~x3 (in edx) with x2 (in eax)
or      eax, ecx               # finally, bitwise OR eax and ecx

The result is stored in eax.

Logical

Looking at the bits of values 0-7, and trying to discern an easy pattern to key off of, you notice that for values 0-3, the number is prime if and only if x2 is 1. Likewise, for values 4-7, the number is prime if and only if x1 is 1. This observation yields a simple expression: x3 ? x1 : x2.

I have no proof that this is the shortest possible expression using logical operators, so if someone has a shorter version, by all means post it in a comment. However, it does seem unlikely that there's a shorter version, given that this is essentially a single logical operator, as you can see if you expand the ternary operator into a proper if/else:

uchar f_logical(uchar x3, uchar x2, uchar x1) 
{
   if (x3 != 0) 
      return x1;
   else
      return x2;
}

The assembly produced by this is as follows (again with -O0 and not counting the function call overhead):

cmp     BYTE PTR [rbp-4], 0      # compare x3 with 0
je      .L2                      # if equal, jump to label L2
movzx   eax, BYTE PTR [rbp-12]   # move x1 into register eax
jmp     .L4                      # jump to label L4 (i.e., return from function)
.L2: 
movzx   eax, BYTE PTR [rbp-8]    # move x2 into register eax
.L4:
# Function return. Result is once again stored in eax.

I haven't tested the performance of either of these functions, but just from looking at the assembly, it seems almost certain that f_logical would run faster than f_bitwise. It uses significantly fewer instructions, and although fewer instructions doesn't always equate to faster, none of these instructions seem like they would be particularly expensive in terms of CPU cycles.

If you cancel out the instructions that both functions have in common and compare what's left, you get:

f_logical: je, jmp

f_bitwise: and (2), mov (2), sete, or

As for why the logical version is shorter, I think the answer is branching. With only bitwise operations and no branching, you have to account for all possibilities in a single expression.

For instance, in (x3 & x1) | (~x3 & x2), it would be nice to get rid of the ~x3 on the right hand side, given that you already know x3 is zero there, given that the right hand side represents the test for values 0-3. But the computer has no way of knowing this, and you can't factor it out into a simpler expression.

With the ability to branch, you can split the problem into two sub-problems using a single comparison operator. Again, this works because for values 0-3, the x2 bit is essentially an "is prime" bit, and for values 4-7, the x1 bit is an "is prime" bit.

Also, alinsoar is correct that a lookup table would be faster, but only if the value isn't split into individual bits. With the bit values in separate variables, you either have to reconstruct the number using something like x3<<2 | x2<<1 | x1, or you have to define your lookup table as a 3D array, in which case the compiler generates a bunch of extra instructions to do the address arithmetic necessary to index a 3D array.