pythonlist-comprehensiondrycode-readability

Python list comprehension - want to avoid repeated evaluation


I have a list comprehension which approximates to:

[f(x) for x in l if f(x)]

Where l is a list and f(x) is an expensive function which returns a list.

I want to avoid evaluating f(x) twice for every non-empty occurence of f(x). Is there some way to save its output within the list comprehension?

I could remove the final condition, generate the whole list and then prune it, but that seems wasteful.

Two basic approaches have been suggested:

An inner generator comprehension:

[y for y in (f(x) for x in l) if y]

or memoization.

I think the inner generator comprehension is elegant for the problem as stated. In fact I simplified the question to make it clear, I really want:

[g(x, f(x)) for x in l if f(x)]

For this more complicated situation, I think memoization produces a cleaner end result.


Solution

  • A solution (the best if you have repeated value of x) would be to memoize the function f, i.e. to create a wrapper function that saves the argument by which the function is called and save it, than return it if the same value is asked.

    a really simple implementation is the following:

    storage = {}
    def memoized(value):
        if value not in storage:
            storage[value] = f(value)
        return storage[value]
    
    [memoized(x) for x in l if memoized(x)]
    

    and then use this function in the list comprehension. This approach is valid under two condition, one theoretical and one practical. The first one is that the function f should be deterministic, i.e. returns the same results given the same input, and the other is that the object x can be used as a dictionary keys. If the first one is not valid than you should recompute f each timeby definition, while if the second one fails it is possible to use some slightly more robust approaches.

    You can find a lot of implementation of memoization around the net, and I think that the new versions of python have something included in them too.

    On a side note, never use the small L as a variable name, is a bad habit as it can be confused with an i or a 1 on some terminals.

    EDIT:

    as commented, a possible solution using generators comprehension (to avoid creating useless duplicate temporaries) would be this expression:

    [g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]
    

    You need to weight your choice given the computational cost of f, the number of duplication in the original list and memory at you disposition. Memoization make a space-speed tradeoff, meaning that it keep tracks of each result saving it, so if you have huge lists it can became costly on the memory occupation front.