I have seen loads of posts on this topic, but none are exactly what I am looking for.
I want to find all ways a positive integer N that is greater than 1 can be expressed as the sum of at most N integers from 1 up to N, just like the usual.
For example, in the standard notation, these are all the partitions of 6:
[(1, 1, 1, 1, 1, 1),
(1, 1, 1, 1, 2),
(1, 1, 1, 2, 1),
(1, 1, 1, 3),
(1, 1, 2, 1, 1),
(1, 1, 2, 2),
(1, 1, 3, 1),
(1, 1, 4),
(1, 2, 1, 1, 1),
(1, 2, 1, 2),
(1, 2, 2, 1),
(1, 2, 3),
(1, 3, 1, 1),
(1, 3, 2),
(1, 4, 1),
(1, 5),
(2, 1, 1, 1, 1),
(2, 1, 1, 2),
(2, 1, 2, 1),
(2, 1, 3),
(2, 2, 1, 1),
(2, 2, 2),
(2, 3, 1),
(2, 4),
(3, 1, 1, 1),
(3, 1, 2),
(3, 2, 1),
(3, 3),
(4, 1, 1),
(4, 2),
(5, 1),
(6,)]
Now, the notation is very low-entropy, first every occurrence of the number increases the size of a particular partition, this is inefficient and it is hard to count the occurrences of the numbers when they recur many times. I want to replace all the occurrences of a number with a two-element tuple, in which the first element is the number and the second is the count, for example (1, 1, 1, 1, 1, 1)
is equivalent to (1, 6)
, they both contain the same information, but one is clearly much more concise.
And second, there are lots of duplicates in the output, for example, there are five partitions that contain four 1s and one 2, they are counted as five separate elements. This is inefficient, too, since addition is commutative, changing the order of the numbers doesn't change the result, so they are all equivalent, they are all the same element.
However if we replace all five with just one element we lose information.
I want to instead replace it with the following format:
Counter({((1, 2), (2, 2)): 6,
((1, 1), (2, 1), (3, 1)): 6,
((1, 4), (2, 1)): 5,
((1, 3), (3, 1)): 4,
((1, 2), (4, 1)): 3,
((1, 1), (5, 1)): 2,
((2, 1), (4, 1)): 2,
((1, 6),): 1,
((2, 3),): 1,
((3, 2),): 1,
((6, 1),): 1})
So I want the result to be a Counter
in which the keys are the unique partitions and the values are how many ways the numbers can be arranged.
And yes I have written a function for this, using brute-force and memoization. It turns out to be pretty efficient.
First this is the implementation that outputs in the standard format, I post it here for comparison:
def partitions(number: int) -> list[tuple[int, ...]]:
result = []
stack = [(number, ())]
while stack:
remaining, path = stack.pop()
if not remaining:
result.append(path)
else:
stack.extend((remaining - i, path + (i,)) for i in range(remaining, 0, -1))
return result
It takes 582 milliseconds to find all partitions of 20 in CPython and 200 milliseconds in PyPy3:
CPython
In [22]: %timeit partitions(20)
582 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
PyPy3
In [36]: %timeit partitions(20)
199 ms ± 3.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Now, the bruteforce approach with memoization that outputs in the intended format:
PARTITION_COUNTERS = {}
def partition_counter(number: int) -> Counter:
if result := PARTITION_COUNTERS.get(number):
return result
result = Counter()
for i in range(1, number):
for run, count in partition_counter(number - i).items():
new_run = []
added = False
for a, b in run:
if a == i:
new_run.append((a, b + 1))
added = True
else:
new_run.append((a, b))
if not added:
new_run.append((i, 1))
result[tuple(sorted(new_run))] += count
result[((number, 1),)] = 1
PARTITION_COUNTERS[number] = result
return result
CPython
In [23]: %timeit PARTITION_COUNTERS.clear(); partition_counter(20)
10.4 ms ± 72.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
PyPy3
In [37]: %timeit PARTITION_COUNTERS.clear(); partition_counter(20)
9.75 ms ± 58.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It takes only 10 milliseconds to find all partitions of 20, much, much faster than the first function, and PyPy3 doesn't make it faster.
But how can we do better? After all, I am just using brute-force, I know there are lots of smart algorithms for integer partition, but none of them generate outputs in the intended format.
It will be faster if you only generate the sorted multisets, and then for each calculate how many distinct permutations it has.
For that you can use the Combination function, i.e. 𝑛-choose-𝑘 noted as 𝑛𝐶𝑘. For example if you generate the partitioning 1+1+2+3+3 (for 𝑛=10), you would generate the counter {1:2, 2:1, 3:2}
and determine the number of permutations as:
5𝐶2 * 3𝐶1 * 2𝐶2
5 here is the number of elements in the multiset with counting duplicates. It gets reduced by the elements you take away...etc. The last term is always 1, since there is only one distinct number remaining for it.
Here is an implementation:
from math import comb
def partition_counter(number: int) -> Counter:
counter = [0] * (number + 1)
result = {}
def recur(number, n, take):
if number == 0:
# calculate number of unique permutations of this multiset:
count = 1
items = []
for i, k in enumerate(counter):
if k:
count *= comb(n, k)
n -= k
items.append((i, k))
result[tuple(items)] = count
return
if take > 1: # don't take any more occurrences of `take`
recur(number, n, take - 1)
if take <= number: # do take another occurrence of `take`
counter[take] += 1
recur(number - take, n + 1, take)
counter[take] -= 1
recur(number, 0, number)
return result
This produces the same output format as your brute force implementation, but faster.