python algorithm optimization depth-first-search

Word Hunt DFS Algorithm Not Finding Optimal Solution

I am trying to code an algorithm for the Word Hunt game on Iphone Game Pigeon.

How word hunt works: There is a 4x4 grid of letters given to each player. Each player must form words that are three letters or longer by joining letters that are either horizontally, vertically, or diagonally adjacent to each other. Once a letter is used in a chain the same tile can not be used again. A higher number of points is awarded to longer words.

I wrote the following code for the algorithm but it seems to not find the longest words possible:

import os
import time

def read_grid():
    grid = []
    for i in range(4):
        i += 1
        row = input(f"Enter row {i} (4 letters): ").strip().lower()
        if len(row) != 4 or not row.isalpha():
            raise ValueError("Each row must contain exactly 4 letters.")
        grid.append(list(row))
    return grid

def read_word_list(file_path):
    file_path = os.path.expanduser("~/Self Projects/Word Hunt/word_list.txt")
    with open(file_path, 'r') as file:
        words = set(word.strip().lower() for word in file if len(word.strip()) >= 3)
    return words

def find_words(grid, word_list):
    def dfs(x, y, path, current_word):
        if not (0 <= x < 4 and 0 <= y < 4):
            return
        if (x, y) in path:
            return
        current_word += grid[x][y]
        if len(current_word) >= 3 and current_word in word_list:
            found_words.add(current_word)
        path.add((x, y))
        for dx, dy in [(-1, -1), (-1, 0), (-1, 1), (0, -1), (0, 1), (1, -1), (1, 0), (1, 1)]:
            dfs(x + dx, y + dy, path.copy(), current_word)
        path.remove((x, y))

    found_words = set()
    for i in range(4):
        for j in range(4):
            dfs(i, j, set(), "")
    return found_words

def word_hunt_solver(grid, word_list):
    possible_words = find_words(grid, word_list)
    sorted_words = sorted(possible_words, key=lambda word: (-len(word), word))
    return sorted_words[:100]

grid = read_grid()

start_time = time.time()

word_list = read_word_list('word_list.txt')
longest_words = word_hunt_solver(grid, word_list)
longest_words.reverse()
for word in longest_words:
    print(word)
    print("\n")
    
print("\n")
print(f"Program Execution Time: {time.time() - start_time}")

The word file I am using is: https://drive.google.com/file/d/1oGDf1wjWp5RF_X9C7HoedhIWMh5uJs8s/view

For example if I input in the grid:

adul
aret
tion
sing

the longest words found are

adulterising
adulteration
adulterating

whereas "adulterations" is a valid word and is longer. Another example is if I input the grid:

unfo
vigr
ingn
esst

the longest words found are

forgiving
fogginess
unforgiving

when "unforgivingness" is a valid longer word. The algorithm seems to not only skip the longest word but many intermediate words. The full output for the above grid is:

vis
vin
vig
vie
uni
sin
sen
sei
org
orf
nis
nie
ins
ing
igg
gor
gnu
gin
gig
gif
fro
for
fog
fin
fig
ess
ens
eng
vise
vins
vine
vigs
vies
snig
sins
sing
sine
sien
sens
orgs
nine
nies
ness
ings
ingo
info
iggs
grog
gins
ging
gigs
frog
fins
fini
fine
figs
figo
engs
visne
vines
vigor
snigs
snies
sings
siens
sengi
oggin
nines
ivies
gings
finis
fines
vining
unfine
sining
oggins
gneiss
giving
singing
seining
givings
forging
ungiving
forgings
forgiving
fogginess
unforgiving

which does not seem optimal at all due to the number of three and four letter words which seemingly should not be included in the 100 longest words.

Solution

In terms of the result, your algorithm works correctly. It does return the expected results. However, in terms of performance, it is quite slow. Let's try to optimize it.

If you stop copying path object, it will improve performance almost twice. There is no point in copying now, the coordinate will be deleted after the loop runs.

Just replace this line of code:

dfs(x + dx, y + dy, path.copy(), current_word)

With this one.

dfs(x + dx, y + dy, path, current_word)

The result of the function will not change, one will greatly increase the performance.

However, I will offer you my solution, which returns the same results, but the search algorithm works instantly.

The first thing that has changed is the file processing. We collect not only possible words, but all possible chunks we can get for a certain word. This will be necessary when searching, so we can quickly weed out unpromising words.

def read_word_list(filepath):
    # add additional logic here if necessary
    possible_chunks = set()
    possible_words = set()

    with open(filepath) as file:
        for line in file.readlines():
            word = line.strip().lower()
            if len(word) >= 3:
                possible_words.add(word)
            for index in range(len(word), -1, -1):
                chunk = word[:index]
                if chunk in possible_chunks:
                    break
                possible_chunks.add(chunk)

    return possible_chunks, possible_words

This function takes the longest time to execute the script because the file is very large and has a lot of words. On my computer, it takes about 0.3 seconds.

Now let's move on to the main change.

from functools import cache

MOVES = (
    (-1, -1),
    (-1, 0),
    (-1, 1),
    (0, -1),
    (0, 1),
    (1, -1),
    (1, 0),
    (1, 1),
)


def find_words(grid, possible_chunks, possible_words):
    @cache
    def find_neighbors(x, y):
        neighbors = set()
        for dx, dy in MOVES:
            nx, ny = dx + x, dy + y
            if 0 <= nx < len(grid) and 0 <= ny < len(grid[0]):
                neighbors.add((nx, ny))
        return neighbors

    def iter_search(x, y, word):
        for nx, ny in find_neighbors(x, y):
            if (nx, ny) in visited:
                continue

            next_letter = grid[nx][ny]
            next_word = word + next_letter

            if next_word not in possible_chunks:
                continue

            if next_word in possible_words:
                yield next_word

            visited.add((nx, ny))
            yield from iter_search(nx, ny, next_word)
            visited.remove((nx, ny))

    result = set()
    visited = set()
    initial = (
        (i, j, l) for i, line in enumerate(grid) for j, l in enumerate(line)
    )
    for i, j, letter in initial:
        visited = {(i, j)}
        result.update(iter_search(i, j, letter))
    return result

Here I added caching of neighboring coordinates, it works a bit faster with it. However, the main change is if next_word not in possible_chunks: continue, this allows us to skip further work for a non-promising word that will eventually fail to produce the desired result. The execution speed on my computer is 0.003–0.005 seconds.

If you have any questions, write me. I will try to supplement the answer.