javatrieanagramdawg

Seeking suggestions on a Word Game implementation using DAWG


I am trying to implement a little game with the following rules: Given a set of random letters (e.g. 10), I want to find all possible words one can form out of these letters. I am using a standard dictionary for this.

Letters are allowed to be used multiple times, and not all letters have to be used, as long it results in words with 4 characters or more. I think this is similar to solving anagrams, except that letters are allowed to be used multiple times.

E.g. Letters given: q r b d t e s Possible words: bedders, dessert, etc.

In search of a data structure supporting O(1) for checking if a proposed word is in the dictionary, I found this paper and subsequently also found a working Java DAWG implementation here.

Here's where I am stuck: When trying to generate a list of possible words one can create from these letters, I am wondering if I am missing something using the DAWG implementation. I have seen other threads on here that are suggestion using a Trie and recursively go through the nodes, but I was hoping I could do something similar with the DAWG.

The implementation I currently have working is by going through all the words of the dictionary, then skip any word that contains a letter that is NOT part of the letters earlier assigned. This works, but is slow, going through all the words of the dictionary * ~20 letters worst-case.

for(String word : dawg.getAllStrings()) {
     boolean blacklist = false;
     for(int i = 0; i<nonLetters.length(); i++) {
         if(word.indexOf(nonLetters.charAt(i)) >= 0) {
             blacklist = true;
             break;
         }
     }

     if(!blacklist)
         possibleWordList.add(word);
}

I tried implementing a recursive word match but struggled as the implementation doesn't expose access to its Node implementation, I could locally change this though.

Without access to the node, I can use dawg.contains(letter), but with the requirement to use letters multiple times I am wondering if this would really help. E.g. I would start with 'A', then 'AA', ... stopping at e.g. 20 A's.

Would all this just be easier with a Trie? Still fairly fast to find matching words, but easier to generate a list of possible words?

Are there other DAWG libraries that expose Node traversing or have a ref implementation to generate all possible words?

Thanks for any help or pointers!


Solution

  • I think this is a good way. I added a recursive method to ModifiableDAWGSet of the DAWG implementation referenced in the question.

    getWords is called with a String prefix, adding up the characters. I first implemented this to generate all words in the DAWG and compared to the existing method of the ModifiableDAWGSet.getAllStrings(). Then I added to skip words that didn't contain words, not including Characters from the list of possible characters.

    private ArrayList<String> allMatchingWords = new ArrayList<String>();
    private ArrayList<Character> possibleCharacters;
    
    private void getWords(ModifiableDAWGNode originNode, String prefix) {
        NavigableMap<Character, ModifiableDAWGNode> transitionTreeMap = originNode.getOutgoingTransitions();
    
        for(Map.Entry<Character, ModifiableDAWGNode> transition : transitionTreeMap.entrySet())
        {
            Character c = transition.getKey();
            if(!possibleCharacters.contains(c))
                continue;
    
            ModifiableDAWGNode n = transition.getValue();
            if(n.isAcceptNode()) //this is a word
            {
                allMatchingWords.add(prefix + c);
            }
            getWords(n, prefix + c);
        }
    }
    
    public ArrayList<String> getAllMatchingWords(Character mustContain, ArrayList<Character> possibleCharacters)
    {
        allMatchingWords.clear();
        this.mustContain = mustContain;
        this.possibleCharacters = possibleCharacters;
        getWords(sourceNode, "");
        return allMatchingWords;
    }