pythonregex

Regex match only if odd number of a character precede and follow


I want to match the closing quote together with the opening quote of the following string if both are on the same line. Two strings may be separated either by a blank or a blank-plus-blank  + .

Regex engine: Python

F.i. from

this is "some string" "; which should match" 234
"and this" + "should also match\"" "\"and this" 
but not this: " " a " + "

I'd like to see matches for:

So in fact, I think it might be best to only match the groups " " and " + " if there is an odd number of quotes before and after the group. Since lookbehing/ahead is fixed length only, I didn't find a good way to do it.

I tried

re.compile(r'(" \+ ")|(" ")(?!;|,)')

but this assumes that there may be no semicolon within a string

and also

re.compile(r'"[^"]+")

but this only finds the strings themselves, but not the "inter-string" quotes.


Solution

  • Here's the character loop parsing method I mentioned above. I track whether we are inside a quote or not, and I track the characters between quotes.

    
    data = """\
    this is "some string" "; which should match" 234
    "and this" + "should also match\\"" "\\"and this" 
    but not this: " " a " + "
    """
    
    def check(line):
        in_quotes = False
        between = "xxxx"
        found = []
        escape = False
    
        for c in line:
            if escape:
                escape = False
            elif c == '"':
                if not in_quotes and between in (' ', ' + '):
                    found.append( between )
                between = ""
                in_quotes = not in_quotes
            elif c == '\\':
                escape = True
            elif not in_quotes:
                between += c
        return found
    
    for line in data.splitlines():
        print(line)
        matches = check(line)
        print(matches)
    

    Output:

    this is "some string" "; which should match" 234
    [' ']
    "and this" + "should also match\"" "\"and this" 
    [' + ', ' ']
    but not this: " " a " + "
    []