I'm having troubles constructing a regular expression using ColdFusion 10. I need reFind() to return zero if a URL contains "dev" at the end of any subdomain with "mydomain.com" in it.
reFind(THE_REGEX, "subdomaindev.mydomain.com") needs to return 0
reFind(THE_REGEX, "subdomain.mydomain.com") needs to return a positive number
I found the following on Adobe's documentation: (http://help.adobe.com/en_US/ColdFusion/10.0/Developing/WSc3ff6d0ea77859461172e0811cbec0a38f-7ffb.html) and based on that I tried to use the lookahead concept.
Thought this would work, but it doesn't:
reFind("(?!dev)\.mydomain\.com$", "subdomaindev.mydomain.com") = 13
reFind("(?!dev)\.mydomain\.com$", "subdomain.mydomain.com") = 10
Don't understand why this gives zero for both:
reFind("(?=dev)\.mydomain\.com$", "subdomaindev.mydomain.com") = 0
reFind("(?=dev)\.mydomain\.com$", "subdomain.mydomain.com") = 0
This is the results I expected from (?=):
reFind("(?:dev)\.mydomain\.com$", "subdomaindev.mydomain.com") = 10
reFind("(?:dev)\.mydomain\.com$", "subdomain.mydomain.com") = 0
NOTE: this is for use with ColdBox's environment config where I can only pass a single regular expression to a variable called "environments" that then calls a method for the matched environment. I would prefer not to have a second check in that method to find the "dev.", but if I must I will.
Thank you for any help!
(Too long for comments)
Don't understand why this gives zero for both
reFind("(?=dev)\.mydomain\.com$", "subdomaindev.mydomain.com") = 0
Truthfully, neither did I. However, I came across this thread which offers a plausible explanation. To paraphrase (using your values):
Look-aheads look forward from the character at which they are placed — and you've placed it before the
.
. So, what you've got is actually saying "anything ending in.mydomain.com
as long as the first three characters starting at that position (.my
) are notdev
" which is always true.
.. or in the case of (?=dev)
, always false because obviously the characters .my
can never be equal to dev
.
Further searching turned up a detailed blog entry by Adam Cameron about regular expressions and look arounds. The section on "Negative look-aheads", contains an example of an expression used to confirm a string does not contain the word CAT:
^(?!.*CAT).*$
The blog entry provides a better explanation, but essentially it utilizes ^
(start), $
(end) and .*
(zero or more characters) - to search the entire string. Whereas your current expression only searches the characters immediately following it, ie ".mydomain.com".
If I am understanding correctly, you could use the approach above to confirm the supplied string does not end with "dev.mydomain.com". Just change "CAT" to the substring you are trying to match ... err... not match. Not highly tested, but something along these lines:
reFind("^(?!.*dev\.mydomain\.com$).*$","subdomain.mydomain.com")
reFind("^(?!.*dev\.mydomain\.com$).*$","subdomaindev.mydomain.com")
Results:
Disclaimer: I am not a regex expert, by any stretch of the imagination, so it is entirely possible there are better options. However, hopefully this helps explain why the current expression is not working the way you expected.
Update:
As noted in the comments @zabuuq's final working expression is:
^(?!.*dev\.mydomain\.com).*\.mydomain\.com$