[SOLVED] Creating wordlist with no more than 2 repeating characters

Creating wordlist with no more than 2 repeating characters

I'm creating a wordlist with uppercase letters A-Z and numbers 0-9. The length is exactly 8 characters long. Using the tool crunch, preinstalled in Kali, I was able to generate a wordlist that doesn't contain any consecutive characters, for example: 'ABCDEF12' would be generated but 'AABBCC11' wouldn't be generated because it contains consecutive characters.

The command used: crunch 8 8 ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 -d 1

I still need to filter down this wordlist by excluding any more than 2 occurrences of the same character, for example: ABCA12AB would be excluded because the letters 'A' and 'B' occur 3 times, and I only want them to occur 2 times maximum.

There isn't any option within crunch to do this, and I've tried looking up regex to filter the results but I'm very new to regex and couldn't figure it out.

Solution

I'm sure there is a clever way to do this using regular expressions. But, here's a quick and dirty way to do it in Python:

filename='/usr/share/dict/american-english'

def StringContainsNoMoreThanNOccurancesOfSameCharacter(s, N):
    H={}
    for i in range(0, len(s)):
        c=s[i]
        if c in H:
            H[c]+=1
            if(H[c]==N+1): return False
        else:
            H[c]=1
    return True            
  
with open(filename) as file:
    for line in file:
        line=line.strip()
        if(StringContainsNoMoreThanNOccurancesOfSameCharacter(line, 2)): print (line)

Just change the first line of the script to point to your source file containing your wordlist (I used /usr/share/dict/american-english to test), then save the python script on your system, and run it from the command line like so:

python3 /path/to/script.py

It should output only those words in your source file that contain no more than two occurrences of the same character.