Fastest way to find all permutations of 0, 1 of width n without itertools in pure Python?

What is an efficient way to get the same result as list(product((0, 1), repeat=n)) without using itertools and any imports?

For example, given n=3, the output be: [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)] in exactly the same order.

This is a programming challenge, it is an exercise, using standard library is staying in comfort zone and doing so doesn't result in learning something new. This problem is impractical, but in finding novel solutions I can gain new programming techniques and insights that I can apply elsewhere.

And no, using f'{n:b}' and bin(n) is stupid because you are converting a number to a string and then from string to int, doing all those unnecessary operations, I have proven it to be slow.

I tried to solve the problem myself, I wrote an efficient function to do this, then two slow functions to compare with the original function.

from typing import Generator, Tuple

def powerset_indices(n: int) -> Generator[Tuple[int, ...], None, None]:
    if not isinstance(n, int) or n < 1:
        raise ValueError("The argument n must be a positive integer")

    numeral = [0] * n
    maxi = n - 1
    for _ in range(1 << n):
        yield tuple(numeral)
        i = maxi
        while True:
            if not (d := numeral[i]):
                numeral[i] = 1
                break
            else:
                numeral[i] = 0
                i -= 1


def powerset_indices1(n: int) -> Generator[Tuple[int, ...], None, None]:
    if not isinstance(n, int) or n < 1:
        raise ValueError("The argument n must be a positive integer")

    for i in range(1 << n):
        yield tuple(map(int, f"{i:0{n}b}"))


def powerset_indices2(n: int) -> Generator[Tuple[bool, ...], None, None]:
    if not isinstance(n, int) or n < 1:
        raise ValueError("The argument n must be a positive integer")

    places = [1 << i for i in range(n - 1, -1, -1)]
    for i in range(1<<n):
        yield tuple((i & p) > 0 for p in places)

In [133]: from itertools import product

In [134]: %timeit list(powerset_indices(16))
19.8 ms ± 92.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [135]: %timeit list(product((0, 1), repeat=16))
6.86 ms ± 32.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [136]: %timeit list(powerset_indices1(16))
184 ms ± 485 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [137]: %timeit list(powerset_indices2(16))
136 ms ± 277 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

My original function is only slower than product and it is faster than the two later functions by a wide margin. Can anyone offer a solution faster than powerset_indices while only using built-in Python?

Solution

I have spent one whole day to solve this problem but I haven't found a satisfying solution. I have written some new solutions and renamed the functions according to their execution time of func(16) (65536 tuples).

My original function is now named powerset_indices9, I have fixed one old function, inspired by this answer, I recognized repeating patterns in binary counting.

Specifically, the last bit (the rightmost bit), the digit just goes cycles through 0 and 1. And then the second last bit cycles through 0, 0, 1, 1. The third last bit cycles through 0, 0, 0, 1, 1, 1 et cetera. Basically the columns just repeat.

I rewrote the function to what is now called powerset_indices8.

I then recognized the recursive nature of counting in binary, if you have one digit, it can only be 0 and 1, if you have two digits you can have 00, 01, 10, and 11, do you see a pattern? The last depth is just repeated twice, on the first repetition a 0 is added to the front, on the second repetition a 1 is added to the front. Now if you have 3 digits, you can have 000, 001, 010, 011, 100, 101, 110, 111, the last depth is again repeated twice and this pattern goes on and on.

I implemented this idea into what is now called powerset_indices10.

I also implemented one recursive approach based on Cartesian product and a few iterative Cartesian product approaches, to test how much of the execution is affected by the specific usage of object types.

I also reimplemented my original approach in two different ways, to again see what is the extent of influence of object type usage on execution time.

And finally I have experimentally confirmed that the approach from the first answer is indeed faster, but I am not satisfied. Because it isn't using a smarter algorithm, it is just using some loop unrolling and a little memoization, those are cheap tricks. I can make it almost instant by just using a dict, but of course this requires a huge amount of memory, and to handle all inputs I would need an infinite amount of memory.

Implementation

import colorsys
import matplotlib.pyplot as plt
import numpy as np
import timeit
from itertools import product
from math import ceil
from scipy.interpolate import make_interp_spline
from typing import Generator, List, Tuple

Bin_Gen = Generator[Tuple[bool, ...], None, None]
Bin_List = List[Tuple[bool, ...]]


def validate(l: int) -> None:
    if not isinstance(l, int) or l < 1:
        raise ValueError("The argument l must be a positive integer")


def powerset_indices_worker(l: int, gen: Bin_Gen) -> Bin_List:
    validate(l)
    return list(gen(l))


def powerset_indices1_gen(l: int) -> Bin_Gen:
    for i in range(1 << l):
        yield tuple(map(int, f"{i:0{l}b}"))


def powerset_indices1(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices1_gen)


def powerset_indices2_gen(l: int) -> Bin_Gen:
    for i in range(1 << l):
        yield tuple(c == "1" for c in f"{i:0{l}b}")


def powerset_indices2(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices2_gen)


def powerset_indices3_gen(l: int) -> Bin_Gen:
    places = [1 << i for i in range(l - 1, -1, -1)]
    for i in range(1 << l):
        yield tuple((i & p) > 0 for p in places)


def powerset_indices3(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices3_gen)


def powerset_indices4_gen(l: int) -> Bin_Gen:
    for i in range(1 << l):
        yield tuple(c == "1" for c in bin(i)[2:].zfill(l))


def powerset_indices4(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices4_gen)


def binary_product(l: int) -> Bin_Gen:
    if l == 1:
        yield (0,)
        yield (1,)
    else:
        for i in ((0,), (1,)):
            for j in binary_product(l - 1):
                yield i + j


def powerset_indices5(l: int) -> Bin_List:
    validate(l)
    return list(binary_product(l))


def powerset_indices6(l: int) -> Bin_List:
    validate(l)
    result = []
    current = [0] * l
    places = list(range(l - 1, -1, -1))
    for i in range(1 << l):
        k = i
        for j in places:
            current[j] = k & 1
            k >>= 1
        result.append(tuple(current))

    return result


def powerset_indices7_gen(l: int) -> Bin_Gen:
    numeral = [0] * l
    maxi = l - 1
    places = list(range(maxi, -1, -1))
    for _ in range(1 << l):
        yield tuple(numeral)
        for i in places:
            if not numeral[i]:
                numeral[i] = 1
                break
            else:
                numeral[i] = 0


def powerset_indices7(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices7_gen)


def powerset_indices8(l: int) -> Bin_List:
    validate(l)
    columns = []
    n = 1 << (l - 1)
    c = 1
    while n:
        columns.insert(0, ([0] * c + [1] * c) * n)
        n >>= 1
        c <<= 1

    return list(zip(*columns))


def powerset_indices9_gen(l: int) -> Bin_Gen:
    numeral = [0] * l
    maxi = l - 1
    for _ in range(1 << l):
        yield tuple(numeral)
        i = maxi
        while True:
            if not numeral[i]:
                numeral[i] = 1
                break
            else:
                numeral[i] = 0
                i -= 1


def powerset_indices9(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices9_gen)


def powerset_indices10(l: int) -> Bin_List:
    validate(l)
    chunks = [(0,), (1,)]
    for _ in range(l - 1):
        chunks = [(0,) + chunk for chunk in chunks] + [(1,) + chunk for chunk in chunks]

    return chunks


OCTUPLE = [
    (0, 0, 0),
    (0, 0, 1),
    (0, 1, 0),
    (0, 1, 1),
    (1, 0, 0),
    (1, 0, 1),
    (1, 1, 0),
    (1, 1, 1),
]

TRIVIAL_TRIPLE = {
    1: [(0,), (1,)],
    2: [(0, 0), (0, 1), (1, 0), (1, 1)],
    3: OCTUPLE,
}


def powerset_indices11_gen(l: int) -> Bin_Gen:
    if triple := TRIVIAL_TRIPLE.get(l):
        yield from triple
    else:
        for t in powerset_indices11_gen(l - 3):
            yield t + (0, 0, 0)
            yield t + (0, 0, 1)
            yield t + (0, 1, 0)
            yield t + (0, 1, 1)
            yield t + (1, 0, 0)
            yield t + (1, 0, 1)
            yield t + (1, 1, 0)
            yield t + (1, 1, 1)


def powerset_indices11(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices11_gen)


def powerset_indices12_gen(l: int) -> Bin_Gen:
    if triple := TRIVIAL_TRIPLE.get(l):
        yield from triple
    else:
        for i in powerset_indices11_gen(l - 3):
            for j in OCTUPLE:
                yield i + j


def powerset_indices12(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices12_gen)


HEXADECIPLE = [
    (0, 0, 0, 0),
    (0, 0, 0, 1),
    (0, 0, 1, 0),
    (0, 0, 1, 1),
    (0, 1, 0, 0),
    (0, 1, 0, 1),
    (0, 1, 1, 0),
    (0, 1, 1, 1),
    (1, 0, 0, 0),
    (1, 0, 0, 1),
    (1, 0, 1, 0),
    (1, 0, 1, 1),
    (1, 1, 0, 0),
    (1, 1, 0, 1),
    (1, 1, 1, 0),
    (1, 1, 1, 1),
]

TRIVIAL_QUADRUPLE = TRIVIAL_TRIPLE | {4: HEXADECIPLE}


def powerset_indices13_gen(l: int) -> Bin_Gen:
    if quadruple := TRIVIAL_QUADRUPLE.get(l):
        yield from quadruple
    else:
        for t in powerset_indices13_gen(l - 4):
            yield t + (0, 0, 0, 0)
            yield t + (0, 0, 0, 1)
            yield t + (0, 0, 1, 0)
            yield t + (0, 0, 1, 1)
            yield t + (0, 1, 0, 0)
            yield t + (0, 1, 0, 1)
            yield t + (0, 1, 1, 0)
            yield t + (0, 1, 1, 1)
            yield t + (1, 0, 0, 0)
            yield t + (1, 0, 0, 1)
            yield t + (1, 0, 1, 0)
            yield t + (1, 0, 1, 1)
            yield t + (1, 1, 0, 0)
            yield t + (1, 1, 0, 1)
            yield t + (1, 1, 1, 0)
            yield t + (1, 1, 1, 1)


def powerset_indices13(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices13_gen)


def powerset_indices14_gen(l: int) -> Bin_Gen:
    if quadruple := TRIVIAL_QUADRUPLE.get(l):
        yield from quadruple
    else:
        for i in powerset_indices14_gen(l - 4):
            for j in HEXADECIPLE:
                yield i + j


def powerset_indices14(l: int) -> Bin_List:
    return powerset_indices_worker(l, powerset_indices14_gen)


def powerset_indices15(l: int) -> Bin_List:
    validate(l)
    return list(product((0, 1), repeat=l))

Benchmarking

def measure_command(func, *args, **kwargs):
    elapsed = timeit.timeit(lambda: func(*args, **kwargs), number=5)
    once = elapsed / 5
    if elapsed >= 1:
        return int(1e9 * once + 0.5)

    times = min(1024, ceil(1 / once))
    return int(
        1e9 * timeit.timeit(lambda: func(*args, **kwargs), number=times) / times + 0.5
    )


def test(func):
    return [measure_command(func, i) for i in range(1, 21)]


correct = powerset_indices15(5)
execution_times = {}
for i in range(1, 16):
    func_name = f"powerset_indices{i}"
    func = globals()[func_name]
    assert func(5) == correct
    execution_times[func_name] = test(func)

cost_per_item = {
    k: [e / (1 << i) for i, e in enumerate(v, start=1)]
    for k, v in execution_times.items()
}

cost_per_digit = {
    k: [e / i for i, e in enumerate(v, start=1)] for k, v in cost_per_item.items()
}

largest_execution = sorted(
    [(k, v[-1]) for k, v in execution_times.items()], key=lambda x: x[1]
)
average_execution = sorted(
    [(k, sum(v) / 20) for k, v in cost_per_item.items()], key=lambda x: x[1]
)
columns = [
    sorted([v[i] for v in execution_times.values()], reverse=True) for i in range(20)
]
overall_performance = [
    (k, [columns[i].index(e) for i, e in enumerate(v)])
    for k, v in execution_times.items()
]
overall_execution = sorted(
    [(a, sum(b) / 28) for a, b in overall_performance], key=lambda x: -x[1]
)

Benchmark results

In [5]: largest_execution
Out[5]:
[('powerset_indices15', 151644514),
 ('powerset_indices13', 205506940),
 ('powerset_indices11', 221764360),
 ('powerset_indices14', 225112600),
 ('powerset_indices12', 236354640),
 ('powerset_indices8', 290304860),
 ('powerset_indices9', 340667560),
 ('powerset_indices7', 369991820),
 ('powerset_indices10', 395936440),
 ('powerset_indices6', 1782400440),
 ('powerset_indices5', 1982757780),
 ('powerset_indices4', 2367651980),
 ('powerset_indices3', 2513876400),
 ('powerset_indices2', 2642760060),
 ('powerset_indices1', 3410389600)]

In [6]: average_execution
Out[6]:
[('powerset_indices15', 107.60829858779907),
 ('powerset_indices13', 147.8689859390259),
 ('powerset_indices11', 163.91065826416016),
 ('powerset_indices14', 165.5812686920166),
 ('powerset_indices12', 178.23384552001954),
 ('powerset_indices8', 226.5921173095703),
 ('powerset_indices10', 232.5905216217041),
 ('powerset_indices9', 299.1427543640137),
 ('powerset_indices7', 332.2262501716614),
 ('powerset_indices6', 944.7164463043213),
 ('powerset_indices5', 1014.3197778701782),
 ('powerset_indices3', 1500.8791118621825),
 ('powerset_indices4', 1571.1585237503052),
 ('powerset_indices2', 1795.7759031295777),
 ('powerset_indices1', 2181.7652240753173)]

In [7]: overall_execution
Out[7]:
[('powerset_indices15', 9.535714285714286),
 ('powerset_indices13', 9.142857142857142),
 ('powerset_indices11', 8.107142857142858),
 ('powerset_indices14', 7.678571428571429),
 ('powerset_indices12', 7.071428571428571),
 ('powerset_indices8', 6.892857142857143),
 ('powerset_indices10', 5.857142857142857),
 ('powerset_indices9', 5.071428571428571),
 ('powerset_indices7', 4.357142857142857),
 ('powerset_indices6', 3.607142857142857),
 ('powerset_indices5', 3.357142857142857),
 ('powerset_indices3', 1.9285714285714286),
 ('powerset_indices4', 1.6428571428571428),
 ('powerset_indices2', 0.5357142857142857),
 ('powerset_indices1', 0.17857142857142858)]

execution_times.json

Plotting

colors = [colorsys.hsv_to_rgb(i / 15, 1, 1) for i in range(15)]

for data in (execution_times, cost_per_item, cost_per_digit):
    ymax = -1e309
    for v in data.values():
        ymax = max(max(v), ymax)

    rymax = ceil(ymax)
    fig, ax = plt.subplots()
    ax.axis([0, 20, 0, ymax])
    ax.set_xticks(range(1, 21))
    ax.set_xticklabels([f"{1<<i}" for i in range(1, 21)])

    
    for c, (k, v) in enumerate(data.items()):
        x = np.linspace(1, 20, 256)
        plt.plot(x, make_interp_spline(range(1, 21), v)(x), label=k, color=colors[c])

    plt.legend()
    plt.show()


fig, ax = plt.subplots()
ax.axis([0, 20, 0, 15])
ax.set_xticks(range(1, 21))
ax.set_yticks(range(16))
ax.set_xticklabels([f"{1<<i}" for i in range(1, 21)])


for c, (k, v) in enumerate(overall_performance):
    plt.plot(range(1, 21), v, label=k, color=colors[c])

plt.legend()
plt.show()

Update

I realize I made a mistake in my implementation, specifically the function powerset_indices8_gen is actually not a generator, it returns a list, so I mistakenly added a list() call overhead.

I have fixed it and rebenchmarked the whole thing, though the fix doesn't change the rankings much.

And I have seen blhsing's answer, it is very smart, I like it, however according to my benchmarks it isn't faster than my smart solutions because of added overhead.

I selected five smart approaches to compare and contrast with blhsing's performance:

def blhsing(n):
    result = [(0,) * n] * (total := 1 << n)
    numerals = [0] * n
    index = 0
    for i in range(1, total):
        index ^= (lsb := i & -i)
        numerals[n - lsb.bit_length()] ^= 1
        result[index] = tuple(numerals)
    return result


assert blhsing(5) == correct
execution_times['blhsing'] = test(blhsing)

smart_functions = {*(f'powerset_indices{i}' for i in (6, 7, 8, 9, 10)), 'blhsing'}
colors1 = [colorsys.hsv_to_rgb(i / 6, 1, 1) for i in range(6)]

execution_times1 = {k: v for k, v in execution_times.items() if k in smart_functions}

In [11]: execution_times['blhsing']
Out[11]:
[912,
 1393,
 2212,
 4404,
 7846,
 15173,
 30498,
 59754,
 127000,
 271207,
 547542,
 1163032,
 2367837,
 4995417,
 10650321,
 22738870,
 51423727,
 96052318,
 203850900,
 392575140]
 
 In [12]: largest_execution1
Out[12]:
[('powerset_indices8', 290304860),
 ('powerset_indices9', 340667560),
 ('powerset_indices7', 369991820),
 ('blhsing', 392575140),
 ('powerset_indices10', 395936440),
 ('powerset_indices6', 1782400440)]

In [13]: average_execution1
Out[13]:
[('powerset_indices8', 226.5921173095703),
 ('powerset_indices10', 232.5905216217041),
 ('powerset_indices9', 299.1427543640137),
 ('blhsing', 308.1007444381714),
 ('powerset_indices7', 332.2262501716614),
 ('powerset_indices6', 944.7164463043213)]

In [14]: overall_performance1
Out[14]:
[('powerset_indices6',
  [2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
 ('powerset_indices7',
  [0, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 2, 3, 2, 2, 3, 3]),
 ('powerset_indices8',
  [1, 0, 1, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]),
 ('powerset_indices9',
  [2, 3, 3, 3, 3, 3, 2, 1, 2, 3, 3, 3, 3, 3, 3, 2, 4, 4, 4, 4]),
 ('powerset_indices10',
  [5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 2, 1]),
 ('blhsing', [4, 4, 4, 2, 2, 2, 3, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2])]

In [15]: overall_execution1
Out[15]:
[('powerset_indices8', 3.0714285714285716),
 ('powerset_indices10', 2.75),
 ('powerset_indices9', 2.0714285714285716),
 ('blhsing', 1.5),
 ('powerset_indices7', 1.1428571428571428),
 ('powerset_indices6', 0.14285714285714285)]