Well in C++ codecvt/locale library is there a proper facet one could use to test if a character "is" something? IE to test if a character is any form of linebreaking character, or represents a numeric or a whitespace etc etc?
Or would one have to go manually/use rely on regex for this?
Yes, using the std::ctype
facet and its is
method:
std::use_facet<std::ctype<char>>(std::locale()).is(std::ctype_base::digit, '9');
The available classification masks can be found here.
There isn't a classification category for line breaking characters; for that, you'll need to use ICU u_getIntPropertyValue
with the UCHAR_LINE_BREAK
and check for U_LB_MANDATORY_BREAK
, etc.