phpregex

Pattern without spaces followed by same pattern with


I've got many of the following type of thing: <span class="c">a+2da + 2d</span>

The first part of the span's content is part of a math expression a+2d (without spaces) and the second part has the same text, but with spaces around the operator: a + 2d. I need to be able to capture the a+2d so I can remove it.

Some examples of the expressions I've got: x-y=3x - y = 3, a+b-c=da + b - c = d, x-y=3x - y = 3 and it could involve underscores and brackets, like a_n=a+(n−1)da_n = a + (n - 1)d

I can find the first half (only) of the simpler examples using the regex:

/<span class=\"c\">((\w*[\+|\-|\=]\w*)*)<\/span>/

and the second half (only) with spaces using

/<span class=\"c\">((\w* [\+|\-|\=] \w*)*)<\/span>/

But I have no idea how to match the 2nd half given the first, or vice-versa. None of the look ahead or look behind examples I came across fit the situation, and became complicated very quickly.


Solution

  • ((\w*)([+-=]).*)(\2 \3.*)
    

    Only for that expression. We first match an operator with its preceding content (\w*)([+-=])

    And then use \2 \3 to search for the same content ahead but containing a space before the operator.

    The first half is captured in the 1st group, and the second half is captured in the 4th group.

    https://regex101.com/r/AW9Ba5/1