pythonlistmultidimensional-arrayflatten

How do I make a flat list out of a list of lists?


I have a list of lists like

[
    [1, 2, 3],
    [4, 5, 6],
    [7],
    [8, 9]
]

How can I flatten it to get [1, 2, 3, 4, 5, 6, 7, 8, 9]?


If your list of lists comes from a nested list comprehension, the problem can be solved more simply/directly by fixing the comprehension; please see How can I get a flat result from a list comprehension instead of a nested list?.

The most popular solutions here generally only flatten one "level" of the nested list. See Flatten an irregular (arbitrarily nested) list of lists for solutions that completely flatten a deeply nested structure (recursively, in general).


Solution

  • A list of lists named xss can be flattened using a nested list comprehension:

    flat_list = [
        x
        for xs in xss
        for x in xs
    ]
    

    The above is equivalent to:

    flat_list = []
    
    for xs in xss:
        for x in xs:
            flat_list.append(x)
    

    Here is the corresponding function:

    def flatten(xss):
        return [x for xs in xss for x in xs]
    

    This is the fastest method. As evidence, using the timeit module in the standard library, we see:

    $ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
    10000 loops, best of 3: 143 usec per loop
    
    $ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
    1000 loops, best of 3: 969 usec per loop
    
    $ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
    1000 loops, best of 3: 1.1 msec per loop
    

    Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.

    The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.