How many possible sequences can be formed that obey the following rules:
Each sequence is formed from the symbols 0-9a-f
.
Each sequence is exactly 16 symbols long.
0123456789abcdef ok
0123456789abcde XXX
0123456789abcdeff XXX
Symbols may be repeated, but no more than 4 times.
00abcdefabcdef00 ok
00abcde0abcdef00 XXX
A symbol may not appear three times in a row.
00abcdefabcdef12 ok
000bcdefabcdef12 XXX
There can be at most two pairs.
00abcdefabcdef11 ok
00abcde88edcba11 XXX
Also, how long would it take to generate all of them?
In combinatorics, counting is usually pretty straight-forward, and can be accomplished much more rapidly than exhaustive generation of each alternative (or worse, exhaustive generative of a superset of possibilities, in order to filter them). One common technique is to reduce a given problem to combinatons of a small(ish) number of disjoint sub-problems, where it is possible to see how many times each subproblem contributes to the total. This sort of analysis can often result in dynamic programming solutions, or, as below, in a memoised recursive solution.
Because combinatoric results are usually enormous numbers, brute-force generation of every possibility, even if it can be done extremely rapidly for each sequence, is impractical in all but the most trivial of cases. For this particular question, for example, I made a rough back-of-the-envelope estimate in a comment (since deleted):
There are 18446744073709551616 possible 64-bit (16 hex-digit) numbers, which is a very large number, about 18 billion billion. So if I could generate and test one of them per second, it would take me 18 billion seconds, or about 571 years. So with access to a cluster of 1000 96-core servers, I could do it all in about 54 hours, just a bit over two days. Amazon will sell me one 96-core server for just under a dollar an hour (spot prices), so a thousand of them for 54 hours would cost a little under 50,000 US dollars. Perhaps that's within reason. (But that's just to generate.)
Undoubtedly, the original question has is part of an exploration of the possibility of trying every possible sequence by way of cracking a password, and it's not really necessary to produce a precise count of the number of possible passwords to demonstrate the impracticality of that approach (or its practicality for organizations which have a budget sufficient to pay for the necessary computing resources). As the above estimate shows, a password with 64 bits of entropy is not really that secure if what it is protecting is sufficiently valuable. Take that into account when generating a password for things you treasure.
Still, it can be interesting to compute precise combinatoric counts, if for no reason other than the intellectual challenge.
The following is mostly a proof-of-concept; I wrote it in Python because Python offers some facilities which would have been time-consuming to reproduce and debug in C: hash tables with tuple keys and arbitrary precision integer arithmetic. It could be rewritten in C (or, more easily, in C++), and the Python code could most certainly be improved, but given that it only takes 70 milliseconds to compute the count request in the original question, the effort seems unnecessary.
This program carefully groups the possible sequences into different partitions and caches the results in a memoisation table. For the case of sequences of length 16, as in the OP, the cache ends up with 2540 entries, which means that the core computation is only done 2540 times:
# The basis of the categorization are symbol usage vectors, which count the
# number of symbols used (that is, present in a prefix of the sequence)
# `i` times, for `i` ranging from 1 to the maximum number of symbol uses
# (4 in the case of this question). I tried to generalise the code for different
# parameters (length of the sequence, number of distinct symbols, maximum
# use count, maximum number of pairs). Increasing any of these parameters will,
# of course, increase the number of cases that need to be checked and thus slow
# the program down, but it seems to work for some quite large values.
# Because constantly adjusting the index was driving me crazy, I ended up
# using 1-based indexing for the usage vectors; the element with index 0 always
# has the value 0. This creates several inefficiencies but the practical
# consequences are insignificant.
### Functions to manipulate usage vectors
def noprev(used, prevcnt):
"""Decrements the use count of the previous symbol"""
return used[:prevcnt] + (used[prevcnt] - 1,) + used[prevcnt + 1:]
def bump1(used, count):
"""Registers that one symbol (with supplied count) is used once more."""
return ( used[:count]
+ (used[count] - 1, used[count + 1] + 1)
+ used[count + 2:]
)
def bump2(used, count):
"""Registers that one symbol (with supplied count) is used twice more."""
return ( used[:count]
+ (used[count] - 1, used[count + 1], used[count + 2] + 1)
+ used[count + 3:]
)
def add1(used):
"""Registers a new symbol used once."""
return (0, used[1] + 1) + used[2:]
def add2(used):
"""Registers a new symbol used twice."""
return (0, used[1], used[2] + 1) + used[3:]
def count(NSlots, NSyms, MaxUses, MaxPairs):
"""Counts the number of sequences of length NSlots over an alphabet
of NSyms symbols where no symbol is used more than MaxUses times,
no symbol appears three times in a row, and there are no more than
MaxPairs pairs of symbols.
"""
cache = {}
# Canonical description of the problem, used as a cache key
# pairs: the number of pairs in the prefix
# prevcnt: the use count of the last symbol in the prefix
# used: for i in [1, NSyms], the number of symbols used i times
# Note: used[0] is always 0. This problem is naturally 1-based
def helper(pairs, prevcnt, used):
key = (pairs, prevcnt, used)
if key not in cache:
# avail_slots: Number of remaining slots.
avail_slots = NSlots - sum(i * count for i, count in enumerate(used))
if avail_slots == 0:
total = 1
else:
# avail_syms: Number of unused symbols.
avail_syms = NSyms - sum(used)
# We can't use the previous symbol (which means we need
# to decrease the number of symbols with prevcnt uses).
adjusted = noprev(used, prevcnt)[:-1]
# First, add single repeats of already used symbols
total = sum(count * helper(pairs, i + 1, bump1(used, i))
for i, count in enumerate(adjusted)
if count)
# Then, a single instance of a new symbol
if avail_syms:
total += avail_syms * helper(pairs, 1, add1(used))
# If we can add pairs, add the already not-too-used symbols
if pairs and avail_slots > 1:
total += sum(count * helper(pairs - 1, i + 2, bump2(used, i))
for i, count in enumerate(adjusted[:-1])
if count)
# And a pair of a new symbol
if avail_syms:
total += avail_syms * helper(pairs - 1, 2, add2(used))
cache[key] = total
return cache[key]
rv = helper(MaxPairs, MaxUses, (0,)*(MaxUses + 1))
# print("Cache size: ", len(cache))
return rv
# From the command line, run this with the command:
# python3 SLOTS SYMBOLS USE_MAX PAIR_MAX
# There are defaults for all four argument.
if __name__ == "__main__":
from sys import argv
NSlots, NSyms, MaxUses, MaxPairs = 16, 16, 4, 2
if len(argv) > 1: NSlots = int(argv[1])
if len(argv) > 2: NSyms = int(argv[2])
if len(argv) > 3: MaxUses = int(argv[3])
if len(argv) > 4: MaxPairs = int(argv[4])
print (NSlots, NSyms, MaxUses, MaxPairs,
count(NSlots, NSyms, MaxUses, MaxPairs))
Here's the result of using this program to compute the count of all valid sequences (since a sequence longer than 64 is impossible given the constraints), taking less than 11 seconds in total:
$ time for i in $(seq 1 65); do python3 -m count $i 16 4; done
1 16 4 2 16
2 16 4 2 256
3 16 4 2 4080
4 16 4 2 65040
5 16 4 2 1036800
6 16 4 2 16524000
7 16 4 2 263239200
8 16 4 2 4190907600
9 16 4 2 66663777600
10 16 4 2 1059231378240
11 16 4 2 16807277588640
12 16 4 2 266248909553760
13 16 4 2 4209520662285120
14 16 4 2 66404063202640800
15 16 4 2 1044790948722393600
16 16 4 2 16390235567479693920
17 16 4 2 256273126082439298560
18 16 4 2 3992239682632407024000
19 16 4 2 61937222586063601795200
20 16 4 2 956591119531904748877440
21 16 4 2 14701107045788393912922240
22 16 4 2 224710650516510785696509440
23 16 4 2 3414592455661342007436384000
24 16 4 2 51555824538229409502827923200
25 16 4 2 773058043102197617863741843200
26 16 4 2 11505435580713064249590793862400
27 16 4 2 169863574496121086821681298457600
28 16 4 2 2486228772352331019060452730124800
29 16 4 2 36053699633157440642183732148192000
30 16 4 2 517650511567565591598163978874476800
31 16 4 2 7353538304042081751756339918288153600
32 16 4 2 103277843408210067510518893242552998400
33 16 4 2 1432943471827935940003777587852746035200
34 16 4 2 19624658467616639408457675812975159808000
35 16 4 2 265060115658802288611235565334010714521600
36 16 4 2 3527358829586230228770473319879741669580800
37 16 4 2 46204536626522631728453996238126656113459200
38 16 4 2 595094456544732751483475986076977832633088000
39 16 4 2 7527596027223722410480884495557694054538752000
40 16 4 2 93402951052248340658328049006200193398898022400
41 16 4 2 1135325942092947647158944525526875233118233702400
42 16 4 2 13499233156243746249781875272736634831519281254400
43 16 4 2 156762894800798673690487714464110515978059412992000
44 16 4 2 1774908625866508837753023260462716016827409668608000
45 16 4 2 19556269668280714729769444926596793510048970792448000
46 16 4 2 209250137714454234944952304185555699000268936613376000
47 16 4 2 2169234173368534856955926000562793170629056490849280000
48 16 4 2 21730999613085754709596718971411286413365188258316288000
49 16 4 2 209756078324313353775088590268126891517374425535395840000
50 16 4 2 1944321975918071063760157244341119456021429461885104128000
51 16 4 2 17242033559634684233385212588199122289377881249323872256000
52 16 4 2 145634772367323301463634877324516598329621152347129008128000
53 16 4 2 1165639372591494145461717861856832014651221024450263064576000
54 16 4 2 8786993110693628054377356115257445564685015517718871715840000
55 16 4 2 61931677369820445021334706794916410630936084274106426433536000
56 16 4 2 404473662028342481432803610109490421866960104314699801413632000
57 16 4 2 2420518371006088374060249179329765722052271121139667645435904000
58 16 4 2 13083579933158945327317577444119759305888865127012932088217600000
59 16 4 2 62671365871027968962625027691561817997506140958876900738150400000
60 16 4 2 259105543035583039429766038662433668998456660566416258886520832000
61 16 4 2 889428267668414961089138119575550372014240808053275769482575872000
62 16 4 2 2382172342138755521077314116848435721862984634708789861244239872000
63 16 4 2 4437213293644311557816587990199342976125765663655136187709235200000
64 16 4 2 4325017367677880742663367673632369189388101830634256108595793920000
65 16 4 2 0
real 0m10.924s
user 0m10.538s
sys 0m0.388s