regexfilterhtmlspecialchars

find character with regular expression except listed items


I'm stuck on a regular expression. I have tons of text which is full of

 `& ,   , &lt... e.t.c.`

I need a regexp which will find all &-s that are not part of listed items.


Solution

  • You didn't mention the language, but you might use a negative lookahead:

    &(?!\#?\w+;)
    

    This should take care of named entities (π) and numbered entities (");