OK, I'm writing a regex that I want to match on a certain url path, and all subfolders underneath it, but with a few excluded. for context, this is for use inside verizon edgecast, which is a CDN caching system. it supports regex, but unfortunately i don't know the 'flavor' of regex it supports and the documentation isn't clear about that either. Seems to support all the core regex features though, and that should be all i need. unfortunately reading the documentation requires an account, but you can get the general idea of edgecast here: https://www.verizondigitalmedia.com/platform/edgecast-cdn/
so, here is some sample data:
help
help/good
help/better
help/great
help/bad
help/bad/worse
and here is the regex I am using right now:
(^help$|help\/[^bad].*)
link: https://regex101.com/r/CBWUDE/1
broken down:
( - start capture group
^ - start of string
help - 1st thing that should match
$ - end of string
| - or
help - another thing that should match
\/ - escaped / so i can match help/
[^bad] - match any single character that isn't b, a, or d
. - any character
* - any number of times
) - end capture group
I would like the first 4 to match, but not the last 2, 'bad' or 'bad/worse' should not be matches, and help/anythingelse should be a match
this regex is working for me, except that help/better is not a match. the reason it's not a match, i'm pretty sure, is because better, contains a character that appears inside 'bad'. if i change 'bettter' to 'getter' then it becomes a match, because it no longer has a b in it.
so what i really want is my 'bad' to only match the whole word bad, and not match any thing with b, a, or d in it. I tried using word boundary to do this, but isn't giving me the results i need, but perhaps i just have the syntax wrong, this is what i tried:
(^help$|help\/[^\bbad\b].*)
but does not seem to work, the 'bad' urls are no longer excluded, and help/better is still not matching with that. I think it's because / is not a word boundary. I'm positive my problem with the original regex is with the part:
[^bad] - match any single character that isn't b, a, or d
my question is, how can i turn [^bad] into something that matches anything that doesn't contain the full string 'bad'?
You're going to want to use negative look ahead (?!bad) instead of negating specific letters [^bad]
I think (^help$|help\/(?!bad).*) is what you're looking for
Edit: if you mean anything with the word bad at all, not just help/bad you can make it (?!.*bad.*) This would prevent you from matching help/matbadtom for example. Full regex: (^help$|help\/(?!.*bad.*).*)