I have a 'collections.defaultdict' (see x below) that is a multi-valued dictionary. All values associated with each unique key are stored in a list.
>>>x
defaultdict(<type 'list'>, {'a': ['aa', 'ab', 'ac'], 'b': ['ba', 'bc'], 'c': ['ca', 'cb', 'cc', 'cd']})
I want to use the Python fuzzywuzzy package in order to search a target string against all the values nested in the multi-valued dictionary and return the top 5 matches based on fuzzywuzzy's built-in edit distance formula.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
query = 'bc'
choices = x
result = process.extract(query, choices, limit=5)
And then I will run a process that takes the closest match (value with highest fuzz ratio score) and identifies which key that closest matched value is associated with. In this example, the closest matched value is of course 'bc' and the associated key is 'b'.
My question is: How do I run the fuzzywuzzy query against all of the values within the nested lists of the dictionary? When I run the fuzzywuzzy process above, I get a TypeError: expected string or buffer.
To get all the values in the lists from your dictionary in a flat list, use
from itertools import chain
and change the line
choices = x
to
choices = chain.from_iterable(x.values())
Consider making a set
out of that if in your real data you have overlapping values.
result:
[('bc', 100), ('ba', 50), ('ca', 50), ('cb', 50), ('cc', 50)]