This is a question, regarding my homework, specifically on NASM.
I am writing an algorithm to find the least whole factor of a number. (Greater than 1)
In pseudo-code it can be summed up as:
if(n%2==0)
return 2;
for(i=3; i <= n/2; i+=2)
if(n%i==0)
return i;
return n;
The program is just slightly slower than the requirement for large numbers. (n
> 1 000 000 000)
The most obvious (to me) improvement would be replacing n/2
with sqrt(n)
. However I am not supposed to know how to use floating-point numbers and finding the integer sqrt by Newton's method seems overkill. (Because I don't actually need to know the exact value and although I haven't tested it, I would imagine finding the isqrt recursively/iteratively would be quite slow)
So I was wondering, whether there is a fast algorithm for some function, such that sqrt(n) < f(n) < n/2
. By "fast" I mean preferably constant time and by f(n) < n/2
I mean significantly less for big n
.
Some of the options I am considering are:
Check, for i <= min(sqrt(2^32), n/2)
, where sqrt(2^32) = 2^16
is a constant.
Replace i <= n/2
with i <= (2^p)
, where p = ⌈log_2(n)/2⌉
or something. (p
is half the position of most significant bit of n
)
In the end I settled on the ⌈log_2(n)/2⌉
version. Since sqrt(n) = 2^(log_2(n)/2)
. So for anyone in need this is my solution. The sqrt(n)
upper bound approximation is O(1)
. Whole algorithm is O(sqrt(n))
(I think).
mov esi,ebx ;ebx = N
shr esi,1 ;esi = N/2
cmovnc esi,2 ;if no remainder RETURN 2
jnc RETURN
mov edi,2 ;edi is max counter
bsr eax,ebx ;eax is most significant set bit of ebx (log_2(eax))
shr eax,1 ;eax/=2
mov cl,al
shl edi,cl ;max counter is 2^(log_2(N)/2 + 1) >= sqrt(N)
mov esi,1 ;default answer is 1
mov ecx,3 ;start counter is 3
START:
mov edx,0
mov eax,ebx
div ecx ;divide N by counter
test edx,edx ;if no remainder RETURN counter
cmovz esi,ecx
jz RETURN
add ecx,2 ;else counter += 2
cmp ecx,edi
jb START ;if counter <= max counter try agian with new divisor
RETURN:
;the answer is in (esi)
P.S. If you are wondering why I don't just check i*i <= N
. It is actually significantly slower than this version. Just adding a mul
inside the loop shouldn't slow it down that much, so I suspect it breaks the branch prediction algorithm each iteration. The cmp ecx,edi
is comparing a counter against a constant, so it might be predicted right most of the times, where the cmp 'ecx*ecx',ebx
would be comparing a square of the counter, which may not be so obvious to the processor.