pythonregexlistsubstringposition

Python - Iterate once over string to find all substrings and their positions


I have the following python code using regex that finds the substrings "¬[", "[", "¬(", "(", ")", "]" and get their positions (I transformed the "¬[" and "¬(" into "[" and "(")

import re

expression = "¬[P∧¬(¬T∧R)]∧(T→¬P)"
# [[0 "¬["], [4 "¬("], [13 "("], [10 ")"], [18 ")"], [11 "]"]]

lsqb = [[match.start(), "["] for match in re.finditer("\¬\[|\[", expression)]
lpar = [[match.start(), "("] for match in re.finditer("\¬\(|\(", expression)]
rpar = [[match.start(), ")"] for  match in re.finditer("\)", expression)]
rsqb = [[match.start(), "]"] for match in re.finditer("\]", expression)]
all = lsqb + lpar + rpar + rsqb

print(lsqb) # [[0, '[']]
print(lpar) # [[4, '('], [13, '(']]
print(rpar) # [[10, ')'], [18, ')']]
print(rsqb) # [[11, ']']]

print(all) # [[0, '['], [4, '('], [13, '('], [10, ')'], [18, ')'], [11, ']']]

The issue is that I'm iterating over the string 4 times (once for each type of parentheses I want to find the position of... ) I'd like to get rid of all those parentheses variables and just have the "all" one while iterating only once over the string and still getting: [[0, '['], [4, '('], [13, '('], [10, ')'], [18, ')'], [11, ']']] as a result


Solution

  • Use a single regular expression that matches all the patterns. You can use a capture group to extract the parenthesis after ¬.

    Then loop over all the matches, generating the appropriate string in the result based on what was matched.

    expression = "¬[P∧¬(¬T∧R)]∧(T→¬P)"
    pattern = r'¬?([\[(])|([\])])'
    all_matches = [(match.start(), match.group(1) or match.group(2))
                    for match in re.finditer(pattern, expression)]
    print(all_matches)
    # [(0, '['), (4, '('), (10, ')'), (11, ']'), (13, '('), (18, ')')]
    

    Each match will only match one side of the pipe, so match.group(1) or match.group(2) selects the matched parenthesis.