pythonalgorithmsortingword-list

Algorithmic problem of wordlist generation


I want to generate x number of variants of an address, but with a list of predefined symbols.

adress = "RUE JEAN ARGENTIN"
a_variants = ["À","Á","Â","Ä","Ã","Å"]
e_variants = ["É","È","Ê","Ë"]
i_variants = ["Ì","Í","Î","Ï"]
u_variants = ["Ù","Ú","Û","Ü"]
r_variants = ["Ŕ","Ŗ"]
o_variants = ["Ò","Ó","Ô","Ö","Õ"]
s_variants = ["Ś","Š","Ş"]
n_variants = ["Ń","Ň"]
d_variants = ["D","Đ"]

Example of output

# RUE JEAN ARGENTINS
# ŔUE JEAN ARGENTINS
# ŖUE JEAN ARGENTINS
# RÙE JEAN ARGENTINS
# RÚE JEAN ARGENTINS
# RÛE JEAN ARGENTINS
# RÙE JEAN ARGENTINS
# RÜE JEAN ARGENTINS
# ....
# ŖUE JEAN ARGENTÌNS
# ....
# ....
# ŖÛÉ JÉÀŇ ÀŖGÈNTÏŇŞ
# ....

I don't especially want a code answer in python or anything, but mostly an algorithmic idea to achieve this. Or even a mathematical calculation to find the number of possible variations.

Thank you very much


Solution

  • It should be something like:

    and you have to multipicate all values

    (number of items in r_variants + 1) * (number of items in u_variants + 1) * .... 
    

    If you count repeated chars in text then you can reduce it to

    ((number of items in a_variants + 1) * number of chars A in text) * ((number of items in e_variants + 1) * number of chars E in text) * .... 
    

    But in code I will use first method:

    address = "RUE JEAN ARGENTIN"
    
    variation = {
    'A': ["À","Á","Â","Ä","Ã","Å"],
    'E': ["É","È","Ê","Ë"],
    'I': ["Ì","Í","Î","Ï"],
    'U': ["Ù","Ú","Û","Ü"],
    'R': ["Ŕ","Ŗ"],
    'O': ["Ò","Ó","Ô","Ö","Õ"],
    'S': ["Ś","Š","Ş"],
    'N': ["Ń","Ň"],
    'D': ["D","Đ"],
    }
    
    result = 1
    
    for char in address:
        if char in variation:
            result *= (len(variation[char]) + 1)
            
    print(result)        
    

    Result:

    37209375   # 37 209 375
    

    EDIT:

    You can substract 1 to skip original version "RUE JEAN ARGENTIN"

    37209374   # 37 209 374
    

    EDIT:

    If you use shorter text and smaller number of variation then you can easly generate all version on paper (or display on screen) and count them - and check if calculation is correct.

    For example for RN it gives 8 variation (9 versions - original)

    You can use itertools.product() to generate all versions

    import itertools
    
    for number, chars in enumerate(itertools.product(['R', "Ŕ","Ŗ"], ['N', "Ń","Ň"]), 1):
        print(f'{number} | {"".join(chars)}')
    

    Result:

    1 | RN
    2 | RŃ
    3 | RŇ
    4 | ŔN
    5 | ŔŃ
    6 | ŔŇ
    7 | ŖN
    8 | ŖŃ
    9 | ŖŇ