regexqtqregexp

Why QString can not pass a QRegExp of the form ("[\\x00-\\xff]*")?


I have QRegExp with the following pattern

QRegExp byteArray;
byteArray.setPattern("[\\x00-\\xff]*");

This is patterns is used to validate QString's. Can some one provide example of what kinds of QString's can not pass this test for the pattern above? I have a bug in which there comes a QString which doesn't match the pattern.

Cand this pattern match any Unicode string?

Example of QString that doesn't get validated by pattern: HÈńr

Why?


Solution

  • QString uses UTF-16 internally, not UTF-8.

    You also need to start with \x0001 for QRegExp.

    int main()
    {
            uint data[] = { 0x10c436, 0 };
            QString s = QString::fromUcs4(data);
            QRegExp r("^[\\x0001-\\xffff]+$");
            qDebug() << s.size() << s.contains(r);
    }
    

    will result in a match,

    2 true
    

    NOTE: If you are using QRegularExpression, the above will no longer match. QRegularExpression uses pcre UTF16 so there must be some fancy checking in PCRE code, although it reports no errors. I haven't looked further into it.

    Also, QRegularExpression accepts \x0000, but QRegExp does not.

    Moral of the story is don't try to match binary data with regular expression.