nasmapproximationsqrtsquare-rootupperbound

Fast integer sqrt upper bound approximation


This is a question, regarding my homework, specifically on NASM.

I am writing an algorithm to find the least whole factor of a number. (Greater than 1)

In pseudo-code it can be summed up as:

if(n%2==0)
    return 2;
for(i=3; i <= n/2; i+=2)
    if(n%i==0)
        return i;
return n;

The program is just slightly slower than the requirement for large numbers. (n > 1 000 000 000)

The most obvious (to me) improvement would be replacing n/2 with sqrt(n). However I am not supposed to know how to use floating-point numbers and finding the integer sqrt by Newton's method seems overkill. (Because I don't actually need to know the exact value and although I haven't tested it, I would imagine finding the isqrt recursively/iteratively would be quite slow)

So I was wondering, whether there is a fast algorithm for some function, such that sqrt(n) < f(n) < n/2. By "fast" I mean preferably constant time and by f(n) < n/2 I mean significantly less for big n.

Some of the options I am considering are:


Solution

  • In the end I settled on the ⌈log_2(n)/2⌉ version. Since sqrt(n) = 2^(log_2(n)/2). So for anyone in need this is my solution. The sqrt(n) upper bound approximation is O(1). Whole algorithm is O(sqrt(n)) (I think).

    mov esi,ebx     ;ebx = N
    shr esi,1       ;esi = N/2
    cmovnc esi,2    ;if no remainder RETURN 2
    jnc RETURN
    mov edi,2       ;edi is max counter
    bsr eax,ebx     ;eax is most significant set bit of ebx (log_2(eax))
    shr eax,1       ;eax/=2
    mov cl,al
    shl edi,cl      ;max counter is 2^(log_2(N)/2 + 1) >= sqrt(N)
    mov esi,1       ;default answer is 1
    mov ecx,3       ;start counter is 3
    START:
    mov edx,0
    mov eax,ebx
    div ecx         ;divide N by counter
    test edx,edx    ;if no remainder RETURN counter
    cmovz esi,ecx
    jz RETURN
    add ecx,2       ;else counter += 2
    cmp ecx,edi
    jb START        ;if counter <= max counter try agian with new divisor
    RETURN:
                    ;the answer is in (esi)
    

    P.S. If you are wondering why I don't just check i*i <= N. It is actually significantly slower than this version. Just adding a mul inside the loop shouldn't slow it down that much, so I suspect it breaks the branch prediction algorithm each iteration. The cmp ecx,edi is comparing a counter against a constant, so it might be predicted right most of the times, where the cmp 'ecx*ecx',ebx would be comparing a square of the counter, which may not be so obvious to the processor.