pythonlistfunctionappendfile-import

Python - importing 127,000+ words to a list, but function only returning partial results


this function is meant to compare all 127,000 + words imported from a dictionary file to a user inputed length. It then should return the amount of words that are equal to that length. It does do this to an extent.

If I enter "15" it returns "0". If I enter "4" it returns "3078".

I am positive that there are words that are 15 characters in length but it returns "0" anyways. I should also mention that if I enter anything greater that 15 the result is still 0 when there is words greater that 15.

try:
    dictionary = open("dictionary.txt")
except:
    print("Dictionary not found")
    exit()


def reduceDict():
    first_list = []

    for line in dictionary:
       line = line.rstrip()
       if len(line) == word_length:
           for letter in line:
               if len([ln for ln in line if line.count(ln) > 1]) == 0:
                   if first_list.count(line) < 1:
                       first_list.append(line)
               else:
                    continue
    if showTotal == 'y':
       print('|| The possible words remaing are: ||\n ',len(first_list))

Solution

  • My reading of:

    if len([ln for ln in line if line.count(ln) > 1]) == 0:
    

    is that the words in question can't have any repeated letters which could explain why no words are being found -- once you get up to 15, repeated letters are quite common. Since this requirement wasn't mentioned in the explanation, if we drop then we can write:

    def reduceDict(word_length, showTotal):
        first_list = []
    
        for line in dictionary:
            line = line.rstrip()
    
            if len(line) == word_length:
                if line not in first_list:
                    first_list.append(line)
    
        if showTotal:
            print('The number of words of length {} is {}'.format(word_length, len(first_list)))
            print(first_list)
    
    try:
        dictionary = open("dictionary.txt")
    except FileNotFoundError:
        exit("Dictionary not found")
    
    reduceDict(15, True)
    

    Which turns up about 40 words from my Unix words file. If we want to put back the unique letters requirement:

    import re
    
    def reduceDict(word_length, showTotal):
        first_list = []
    
        for line in dictionary:
            line = line.rstrip()
    
            if len(line) == word_length and not re.search(r"(.).*\1", line):
                if line not in first_list:
                    first_list.append(line)
    
        if showTotal:
            print('The number of words of length {} is {}'.format(word_length, len(first_list)))
            print(first_list)
    

    Which starts returning 0 results around 13 letters as one might expect.