pythonlistpython-3.xloopsdictionary

List To Dictionary - Improving Efficiency


I am attempting to create a function that takes a 2 dimensional list and return a dictionary. I am wondering if there is a more efficient way instead of what I have written (e.g. list comprehension / itertools?) I am relatively new to python and have read some examples on list comprehension and itertools doc (Iterating over a 2 dimensional python list) but can't seem to implement it to this chunk of code.

Any help would be appreciated. Thank you!

def listToDict(self, lstInputs):        
    dictOutput = dict()
    rows = len(lstInputs)
    cols = len(lstInputs[0])
    if rows == 2:
        for x in range(rows):
            if lstInputs[0][x] is not None:
                if lstInputs[1][x] is not None:
                    dictOutput[lstInputs[0][x].strip()] = lstInputs[1][x].strip()
                else:
                    dictOutput[lstInputs[0][x].strip()] = lstInputs[1][x]
    elif cols == 2:
        for x in range(rows):
            if lstInputs[x][0] is not None:
                if lstInputs[x][1] is not None:
                    dictOutput[lstInputs[x][0].strip()] = lstInputs[x][1].strip()
                else:
                    dictOutput[lstInputs[x][0].strip()] = lstInputs[x][1]
    else:
        pass
    
    return dictOutput

Solution

  • Your function is doing way too many things:

    1. Trying to find out if it's input is a sequence of key=>value pairs or a pair of keys, values sequences. It's unreliable. Don't try to guess, it's the caller's duty to pass the right structure, because only the caller knows what data he wants to turn into a dict.

    2. Cleaning (currently striping) keys and vals. Here again it only makes sense if both are strings, which is not guaranteed to be the case (at least not from the function's name nor documentation...). You could of course test if your keys and/or values are indeed strings but this adds quite some overhead. Here again it's the caller's duty to do the (eventual) cleaning.

    To make a long story short, your function should only expect a single data structure (either a sequence of key=>value pairs or a pair of (keys, values) sequence, and not apply any cleanup, leaving on the caller the responsibility to provide what's expected.

    Actually, building a dict from a sequence (or any iterable) of pairs is actually so trivial that you don't need a special function, it's just a matter of passing the sequence to the dict constructor:

    >>> lst_of_pairs = [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]
    >>> dict(lst_of_pairs) 
    {0: 'a', 1: 'b', 2: 'c', 3: 'd'}
    

    Or on more recent python versions using a dict comprehension which can faster:

    >>> lst_of_pairs = [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]
    >>> {k:v for k, v in lst_of_pairs} 
    {0: 'a', 1: 'b', 2: 'c', 3: 'd'}
    

    So well, your first building block is builtin and don't need any special func.

    Note that this works with any iterable as long as 1. it yields only pairs and 2. the keys (first items of the pairs) are unique. So if you want to apply some cleaning before building the dict, you can do it with a generator function or expression, ie if the caller knows all the keys are strings and might need striping and all the values are either strings needing striping or None, you can pass a generator expression instead of the source list, ie:

    >>> lst_of_pairs = [(" a ", "1 "), ("b ", None), ("c", " fooo ")]
    >>> {k.strip(): v if v is None else v.strip() for k, v in lst_of_pairs}
    {'a': '1', 'c': 'fooo', 'b': None}
    

    Finally, transposing a pair of keys, values sequences to a sequence of key=>value pairs is what the builtin zip() and it's lazy version itertools.izip() are for:

    >>> keys = [' a ', 'b ', 'c']
    >>> values = ['1 ', None, ' fooo ']
    >>> zip(keys, values)
    [(' a ', '1 '), ('b ', None), ('c', ' fooo ')]
    >>> list(itertools.izip(keys, values))
    [(' a ', '1 '), ('b ', None), ('c', ' fooo ')]
    

    Putting it together, the most "devious" case (building a dict from a sequence of keys and a sequence of values, applying striping to keys and conditionally applying striping to values) can be expressed as:

    >>> {k.strip(): v if v is None else v.strip() for k, v in itertools.izip(keys, values)}
    {'a': '1', 'c': 'fooo', 'b': None}
    

    If it's for a one-shot use, that actually all you need.

    Now if you have a use case where you know you will have to apply this from different places in your code with always the same cleaning but either lists of pairs or pairs of lists, you of course want to factor it out as much as possible - but not more:

    def to_dict(pairs):
        return {
            k.strip(): v if v is None else v.strip()) 
            for k, v in lst_of_pairs
            }
    

    and then leave it to the caller to apply zip() before if needed:

    def func1():
        keys = get_the_keys_from_somewhere()
        values = get_the_values_too()
        data = to_dict(itertools.izip(keys, values))
        do_something_with(data)
    
    
    def func2()
       pairs = get_some_seqence_of_pairs()
        data = to_dict(pairs)
        do_something_with(data)
       
    

    As to whether you want to use zip() or itertools.izip(), it mostly depends on your Python version and your inputs.

    If you're using Python 2.x, zip() will build a new list in memory while itertools.izip() will build it lazily, so there's a slight performance overhead from using itertools.izip() but it will save a lot of memory if you're working large datasets.

    If you're using Python3.x, zip() has been turned into an iterator, thus replacing itertools.izip() so the question becomes irrelevant ;)