I have QRegExp with the following pattern
QRegExp byteArray;
byteArray.setPattern("[\\x00-\\xff]*");
This is patterns is used to validate QString
's.
Can some one provide example of what kinds of QString
's can not pass this test for the pattern above? I have a bug in which there comes a QString which doesn't match the pattern.
Cand this pattern match any Unicode string?
Example of QString that doesn't get validated by pattern: HÈńr
Why?
QString uses UTF-16 internally, not UTF-8.
You also need to start with \x0001 for QRegExp.
int main()
{
uint data[] = { 0x10c436, 0 };
QString s = QString::fromUcs4(data);
QRegExp r("^[\\x0001-\\xffff]+$");
qDebug() << s.size() << s.contains(r);
}
will result in a match,
2 true
NOTE: If you are using QRegularExpression, the above will no longer match. QRegularExpression uses pcre UTF16 so there must be some fancy checking in PCRE code, although it reports no errors. I haven't looked further into it.
Also, QRegularExpression accepts \x0000, but QRegExp does not.
Moral of the story is don't try to match binary data with regular expression.