pythonlistdictionaryfilepython-re

Sumarize double for loop into list comprehension


I've been trying to translate these two for loops into list comprehension:

with open(sourceFile, 'r+t') as file:
        for line in file:
            for key, value in patterns.items():
                line = re.compile(value, flags=re.IGNORECASE).sub(key, line)
            targetList.append(line)

I would like to end up with something like this:

return [[re.compile(value, flags=re.IGNORECASE).sub(key, line) for key, value in patterns.items()] for line in file]

Solution

  • The most straightforward way is to put the work done in the inner loop inside a function:

    def transform(line):
        for key, value in patterns.items():
            line = re.compile(value, flags=re.IGNORECASE).sub(key, line)
        return line
    
    result = [transform(line) for line in file]
    

    If you want everything in one self-contained nested list comprehension ... well, you really shouldn't. The inner loop is done for side effects. List comprehensions are for expressing mapping/filtering operations. Not for side effects. Now, it has become possible to use list comprehensions with an accumulator variable, like you are doing in your inner loop, using assignment expressions. But they are purposefully limited in where they can occur (precisely to nip these sorts of shenanigans in the bud). This makes turning this into a pure list comprehension really ugly (it would be ugly anyway)! Here is one way:

    result = [
        ((acc:=line) and [(acc:=re.compile(value, flags=re.IGNORECASE).sub(key, line)) for  key, value in patterns.items()])[-1] 
        for line in lines
    ]
    

    It's horrible (and thats even without handling the case where line is an empty string).

    And note, as always when using a list comprehension for side effects, this needlessly builds a list, in this case, only to extract the last item from it.

    Perhaps, you'd think you could do a more straightfoward transliteration by doing something like:

    result = [
        [(line:=re.compile(value, flags=re.IGNORECASE).sub(key, line)) for key, value in patterns.items()][-1]
        for line in lines
    ]
    

    But the above is a syntax error:

    SyntaxError: assignment expression cannot rebind comprehension iteration variable 'line'
    

    So you have to jump through even more hoops because assignment expressions are prohibited from rebinding an iteration variable. You can read more about the motivations for prohibiting that here