pythonfactorizationgmpy

Most efficient way to find all factors with GMPY2 (or GMP)?


I know there's already a question similar to this, but I want to speed it up using GMPY2 (or something similar with GMP). Here is my current code, it's decent but can it be better?

Edit: new code, checks divisors 2 and 3

def factors(n):
    result = set()
    result |= {mpz(1), mpz(n)}


    def all_multiples(result, n, factor):
        z = mpz(n)
        while gmpy2.f_mod(mpz(z), factor) == 0:
            z = gmpy2.divexact(z, factor)
            result |= {mpz(factor), z}
        return result

    result = all_multiples(result, n, 2)
    result = all_multiples(result, n, 3)

    for i in range(1, gmpy2.isqrt(n) + 1, 6):
        i1 = mpz(i) + 1
        i2 = mpz(i) + 5
        div1, mod1 = gmpy2.f_divmod(n, i1)
        div2, mod2 = gmpy2.f_divmod(n, i2)
        if mod1 == 0:
            result |= {i1, div1}
        if mod2 == 0:
            result |= {i2, div2}
    return result

If it's possible, I'm also interested in an implementation with divisors only within n^(1/3) and 2^(2/3)*n(1/3)

As an example, mathematica's factor() is much faster than the python code. I want to factor numbers between 20 and 50 decimal digits. I know ggnfs can factor these in less than 5 seconds.

I am interested if any module implementing fast factorization exists in python too.


Solution

  • I just made some quick changes to your code to eliminate redundant name lookups. The algorithm is still the same but it is about twice as fast on my computer.

    import gmpy2
    from gmpy2 import mpz
    
    def factors(n):
        result = set()
        n = mpz(n)
        for i in range(1, gmpy2.isqrt(n) + 1):
            div, mod = divmod(n, i)
            if not mod:
                result |= {mpz(i), div}
        return result
    
    print(factors(12345678901234567))
    

    Other suggestions will need more information about the size of the numbers, etc. For example, if you need all the possible factors, it may be faster to construct those from all the prime factors. That approach will let you decrease the limit of the range statement as you proceed and also will let you increment by 2 (after removing all the factors of 2).

    Update 1

    I've made some additional changes to your code. I don't think your all_multiplies() function is correct. Your range() statement isn't optimal since 2 is check again but my first fix made it worse.

    The new code delays computing the co-factor until it knows the remainder is 0. I also tried to use the built-in functions as much as possible. For example, mpz % integer is faster than gmpy2.f_mod(mpz, integer) or gmpy2.f_mod(integer, mpz) where integer is a normal Python integer.

    import gmpy2
    from gmpy2 import mpz, isqrt
    
    def factors(n):
        n = mpz(n)
    
        result = set()
        result |= {mpz(1), n}
    
        def all_multiples(result, n, factor):
            z = n
            f = mpz(factor)
            while z % f == 0:
                result |= {f, z // f}
                f += factor
            return result
    
        result = all_multiples(result, n, 2)
        result = all_multiples(result, n, 3)
    
        for i in range(1, isqrt(n) + 1, 6):
            i1 = i + 1
            i2 = i + 5
            if not n % i1:
                result |= {mpz(i1), n // i1}
            if not n % i2:
                result |= {mpz(i2), n // i2}
        return result
    
    print(factors(12345678901234567))
    

    I would change your program to just find all the prime factors less than the square root of n and then construct all the co-factors later. Then you decrease n each time you find a factor, check if n is prime, and only look for more factors if n isn't prime.

    Update 2

    The pyecm module should be able to factor the size numbers you are trying to factor. The following example completes in about a second.

    >>> import pyecm
    >>> list(pyecm.factors(12345678901234567890123456789012345678901, False, True, 10, 1))
    [mpz(29), mpz(43), mpz(43), mpz(55202177), mpz(2928109491677), mpz(1424415039563189)]