This is a purely academic question without any practical consideration. This is not homework, I dropped out of high school long ago. I am just curious, and I can't sleep well without knowing why.
I was messing around with Python. I decided to factorize big integers and measure the runtime of calls for each input.
I used a bunch of numbers and found that some numbers take much longer to factorize than others.
I then decided to investigate further, I quickly wrote a prime sieve function to generate primes for testing. I found out that a product of a pair of moderately large primes (two four-digit primes) take much longer to be factorized than a product of one very large prime (six-digits+) and a small prime (<=three-digits).
At first I thought my first simple function for testing is inefficient, that is indeed the case, so I wrote a second function that pulled primes directly from pre-generated list of primes, the second function was indeed more efficient, but strangely it exhibits the same pattern.
Here are some numbers that I used:
13717421 == 3607 * 3803
13189903 == 3593 * 3671
56267023 == 7187 * 7829
65415743 == 8087 * 8089
12345679 == 37 * 333667
38760793 == 37 * 1047589
158202851 == 151 * 1047701
762312571 == 727 * 1048573
Code:
import numpy as np
from itertools import cycle
def factorize(n):
factors = []
while not n % 2:
factors.append(2)
n //= 2
i = 3
while i**2 <= n:
while not n % i:
factors.append(i)
n //= i
i += 2
return factors if n == 1 else factors + [n]
TRIPLE = ((4, 2), (9, 6), (25, 10))
WHEEL = ( 4, 2, 4, 2, 4, 6, 2, 6 )
def prime_sieve(n):
primes = np.ones(n + 1, dtype=bool)
primes[:2] = False
for square, double in TRIPLE:
primes[square::double] = False
wheel = cycle(WHEEL)
k = 7
while (square := k**2) <= n:
if primes[k]:
primes[square::2*k] = False
k += next(wheel)
return np.flatnonzero(primes)
PRIMES = list(map(int, prime_sieve(1048576)))
TEST_LIMIT = PRIMES[-1] ** 2
def factorize_sieve(n):
if n > TEST_LIMIT:
raise ValueError('Number too large')
factors = []
for p in PRIMES:
if p**2 > n:
break
while not n % p:
factors.append(p)
n //= p
return factors if n == 1 else factors + [n]
Test result:
In [2]: %timeit factorize(13717421)
279 μs ± 4.29 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [3]: %timeit factorize(12345679)
39.6 μs ± 749 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [4]: %timeit factorize_sieve(13717421)
64.1 μs ± 688 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [5]: %timeit factorize_sieve(12345679)
12.6 μs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [6]: %timeit factorize_sieve(13189903)
64.6 μs ± 964 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [7]: %timeit factorize_sieve(56267023)
117 μs ± 3.88 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [8]: %timeit factorize_sieve(65415743)
130 μs ± 1.38 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [9]: %timeit factorize_sieve(38760793)
21.1 μs ± 232 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [10]: %timeit factorize_sieve(158202851)
21.4 μs ± 385 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [11]: %timeit factorize_sieve(762312571)
22.1 μs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
As you can clearly see, factorization of two medium primes on average takes much longer than two extremes. Why is it this case?
In don't recognize, that this question is related to programming.
Your algorithm executes trial divisions (starting with the smallest numbers) and terminates at the square root of the input number, so the worst case is actually trying to factorize the square number of a prime (maximum factor possible). Encountering a low factor speeds up the process, since you immediately divide n and there is a chance the test factor reached squared exceeds the reduced n already.