pythonpython-2.7trie

How to implement the remove function of a trie in python?


I've read the following implementation of the trie in python: https://stackoverflow.com/a/11016430/2225221

and tried to make the remove fnction for it. Basically, I had problems even with the start: If you want to remove a word from a trie, it can has sub-"words", or it can be "subword" of another word.

If you remove with "del dict[key]", you are removing these above mentioned two kinds of words also. Could anyone help me in this, how to remove properly the chosen word (let us presume it's in the trie)


Solution

  • Basically, to remove a word from the trie (as it is implemented in the answer you linked to), you'd just have to remove its _end marker, for example like this:

    def remove_word(trie, word):
        current_dict = trie
        for letter in word:
            current_dict = current_dict.get(letter, None)
            if current_dict is None:
                # the trie doesn't contain this word.
                break
        else:
            del current_dict[_end]
    

    Note however that this doesn't ensure that the trie has its minimal size. After deleting the word, there may be branches in the trie left that are no longer used by any words. This doesn't affect the correctness of the data structure, it just means that the trie may consume more memory than absolutely necessary. You could improve this by iterating backwards from the leaf node and delete branches until you find one that has more than one child.

    EDIT: Here's an idea how you could implement a remove function that also culls any unnecessary branches. There's probably a more efficient way to do it, but this might get you started:

    def remove_word2(trie, word):
        current_dict = trie
        path = [current_dict]
        for letter in word:
            current_dict = current_dict.get(letter, None)
            path.append(current_dict)
            if current_dict is None:
                # the trie doesn't contain this word.
                break
        else:
            if not path[-1].get(_end, None):
                # the trie doesn't contain this word (but a prefix of it).
                return
            deleted_branches = []
            for current_dict, letter in zip(reversed(path[:-1]), reversed(word)):
                if len(current_dict[letter]) <= 1:
                    deleted_branches.append((current_dict, letter))
                else:
                    break
            if len(deleted_branches) > 0:
                del deleted_branches[-1][0][deleted_branches[-1][1]]
            del path[-1][_end]
    

    Essentially, it first finds the "path" to the word that is about to be deleted and then iterates through that backwards to find nodes that can be removed. It then removes the root of the path that can be deleted (which also implicitly deletes the _end node).