c++regexc++11multiple-matches

C++11: Safe practice with regex of two possible number of matches


With this regex, I would like to match time with or without a milliseconds (ms) field. For completeness, I write the regex here (I removed the anchors in regex101 to enable multi-line):

^(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])(?:|(?:\.)([0-9]{1,6}))$

I kind of don't understand the C++ behavior of this. Now you see in regex101, the number of capture groups depends on the string. If there's no ms, it's 3+1 (since C++ uses match[0] for the matched pattern), and if there's ms, then it's 4+1. But then in this example:

std::regex timeRegex = std::regex(R"(^(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])(?:|(?:\.)([0-9]{1,6}))$)");
std::smatch m;
std::string strT = std::string("12:00:09");
bool timeMatch = std::regex_match(strT, m, timeRegex);
std::cout<<m.size()<<std::endl;
if(timeMatch)
{
    std::cout<<m[0]<<std::endl;
    std::cout<<m[1]<<std::endl;
    std::cout<<m[2]<<std::endl;
    std::cout<<m[3]<<std::endl;
    std::cout<<m[4]<<std::endl;
}

We see that m.size() is always 5, whether there is or not an ms field! m[4] is an empty string if there's no ms field. Is this behavior the default one in regex of C++? Or should I try/catch (or some other safety measure) when in doubt of the size? I mean... even the size is a little misleading here!


Solution

  • m.size() will always be the number of marked subexpressions in your expression plus 1 (for the whole expression).

    In your code you have 4 marked subexpressions, whether these are matched or not has no effect on the size of m.

    If you want to now if there are milliseconds, you can check:

    m[4].matched