I've got many of the following type of thing: <span class="c">a+2da + 2d</span>
The first part of the span's content is part of a math expression a+2d
(without spaces) and the second part has the same text, but with spaces around the operator: a + 2d
. I need to be able to capture the a+2d
so I can remove it.
Some examples of the expressions I've got:
x-y=3x - y = 3
, a+b-c=da + b - c = d
, x-y=3x - y = 3
and it could involve underscores and brackets, like a_n=a+(n−1)da_n = a + (n - 1)d
I can find the first half (only) of the simpler examples using the regex:
/<span class=\"c\">((\w*[\+|\-|\=]\w*)*)<\/span>/
and the second half (only) with spaces using
/<span class=\"c\">((\w* [\+|\-|\=] \w*)*)<\/span>/
But I have no idea how to match the 2nd half given the first, or vice-versa. None of the look ahead or look behind examples I came across fit the situation, and became complicated very quickly.
((\w*)([+-=]).*)(\2 \3.*)
Only for that expression. We first match an operator with its preceding content (\w*)([+-=])
And then use \2 \3
to search for the same content ahead but containing a space before the operator.
The first half is captured in the 1st group, and the second half is captured in the 4th group.