pythonrecursiontrienltk-book

Python - printing out a trie in alphabetically sorted order with a recursive function


I'm working my way through the NLTK book by Bird, Klein, and Loper and I'm stuck on a problem. I'm working through the book for my own personal enrichment, and not for a class.

The problem I'm stuck on is 4.29:

Write a recursive function that pretty prints a trie in alphabetically sorted order, e.g.:

chair: 'flesh' ---t: 'cat' --ic: 'stylish' ---en: 'dog'

I'm using this code from the book to create the trie:

def insert(trie, key, value):
    if key:
        first, rest = key[0], key[1:]
        if first not in trie:
            trie[first] = {}
        insert(trie[first], rest, value)
    else:
        trie['value'] = value

trie = {}
insert(trie, 'chat', 'cat')
insert(trie, 'chien', 'dog')
insert(trie, 'chair', 'flesh')
insert(trie, 'chic', 'stylish')

I modified an answer from this discussion for a function that recursively goes through the trie and pulls out the complete keys and values:

def trawl_trie(trie):
    unsorted = []
    for key, value in trie.items():
        if 'value' not in key:
            for item in trawl_trie(trie[key]):
                unsorted.append(key + item)
        else:
            unsorted.append(': ' + value)

    return unsorted

But I'm not able to use recursion to make an alphabetical list, nor can I figure out how to use recursion to replace the duplicate parts of the keys. The best that I can do is to create a helper function that goes through the results of the function above:

def print_trie(trie):

    # sort list alphabetically
    alphabetized = list(sorted(set(trawl_trie(trie))))


    print(alphabetized[0])

    # compare the 2nd item ~ to the previous one in the list.
    for k in range(1, len(alphabetized)):
        # separate words from value
        prev_w, prev_d = (re.findall(r'(\w+):', alphabetized[k - 1]), re.findall(r': (\w+)', alphabetized[k - 1]))
        curr_w, curr_d = (re.findall(r'(\w+):', alphabetized[k]), re.findall(r': (\w+)', alphabetized[k]))
        word = ''

        # find parts that match and replace them with dashes
        for i in range(min(len(prev_w[0]), len(curr_w[0]))):
            if prev_w[0][i] == curr_w[0][i]:
                word += prev_w[0][i]

        curr_w[0] = re.sub(word, '-' * len(word), curr_w[0])
        print(curr_w[0] + ": " + str(curr_d[0]))

This would be the output:

print_trie(trie)

chair: flesh
---t: cat
--ic: stylish
---en: dog

Does anyone know if it would be possible to get the same result with one recursive function? Or I am stuck using a recursive function to go through the trie, and a second helper function to make everything look nice?

Cheers,


Solution

  • def insert(trie, key, value):
        """Insert into Trie"""
        if key:
            first, rest = key[0], key[1:]
            if first not in trie:
                trie[first] = {}
            insert(trie[first], rest, value)
        else:
            trie['value'] = value
    
    def display(trie, s = ""):
      """Recursive function to Display Trie entries in alphabetical order"""
      first = True
      for k, v in sorted(trie.items(), key = lambda x: x[0]):
        # dictionary sorted based upon the keys
        if isinstance(v, dict):
          if first:
            prefix = s + k          # first to show common prefix
            first = False
          else:
            prefix = '-'*len(s) + k  # dashes for common prefix
    
          display(v, prefix)   # s+k is extending string s for display by appending current key k
        else:
          print(s, ":", v)  # not a dictionary, so print current   # not a dictionary, so print current string s and value
    
    # Create Trie
    trie = {}
    insert(trie, 'chat', 'cat')
    insert(trie, 'chien', 'dog')
    insert(trie, 'chair', 'flesh')
    insert(trie, 'chic', 'stylish')
    
    #Display Use Recursive function (second argument will default to "" on call)
    display(trie)
    

    Output

    chair : flesh
    ---t : cat
    --ic : stylish
    ---en : dog