pythonpython-2.7list-comprehension

List vs generator comprehension speed with join function


So I got these examples from the official documentation.

What exactly makes the first example (generator expression) slower than the second (list comprehension)?

>>> timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)
0.3018611848820001
>>> timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000)
0.2727368790656328

Solution

  • The str.join method converts its iterable parameter to a list if it's not a list or tuple already. This lets the joining logic iterate over the items multiple times (it makes one pass to calculate the size of the result string, then a second pass to actually copy the data).

    You can see this in the CPython source code:

    Py_LOCAL_INLINE(PyObject *)
    STRINGLIB(bytes_join)(PyObject *sep, PyObject *iterable)
    {
        /* lots of variable declarations at the start of the function omitted */
        
        seq = PySequence_Fast(iterable, "can only join an iterable");
        
        /* ... */
    }
    

    The PySequence_Fast function in the C API does just what I described. It converts an arbitrary iterable into a list (essentially by calling list on it), unless it already is a list or tuple.

    The conversion of the generator expression to a list means that the usual benefits of generators (a smaller memory footprint and the potential for short-circuiting) don't apply to str.join, and so the (small) additional overhead that the generator has makes its performance worse.