I'm new to TDD, and I find RegExp quite a particular case. Is there any special way to unit test them, or may I just treat them as regular functions?
You should always test your regexen, much like any other chunk of code. They're at the most simple a function that takes a string and returns a bool, or returns an array of values.
Here are some suggestions on what to think about when it comes to designing unit tests for regexen. These are not not hard and fast prescriptions for unit test design, but some guidelines to shape your thinking. As always, weigh the needs of your testing versus cost of failure balanced with the time required to implement them all. (I find that 'implementing' the test is the easy part! :-] )
Points to consider:
For a regex that returns lists, also remember:
(?<name> thing1 ( thing2) )
) - this behavior can be different based on the regex engine you're using.If you use any advanced features, such as non-backtracking groups, make sure you understand completely how the feature works, and using the guidelines above, build example strings that should work for and against each of them.
Depending on your regex library implementation, the way groups are captured may be different as well. Perl 5 has a 'open paren order' ordering, C# has that partially except for named groups and so on. Make sure to experiment with your flavor to know exactly what it does.
Then, integrate them right in with your other unit tests, either in their own module or alongside the module that contains the regex. For particularly nasty regexen, you may find you need lots and lots of tests to verify that the pattern and all the features you use are correct. If the regex makes up a large (or nearly all) of the work that the method is doing, I will use the advice above to fashion inputs to test that function and not the regex directly. That way, if later you decide that the regex is not the way to go, or you want to break it up, you can capture the behavior the regex provided without changing the interface - ie, the method that invokes the regex.
As long as you really know how a regex feature is supposed to work in your flavor of regex, you should be able to develop decent test cases for it. Just make sure you really, really, really do understand how the feature works!