Regular expression: “[^”]*“
String: “lips“
Result: match
String: “lips’“
Result: not match
I expect both strings to match.
C++ code:
#include <iostream>
#include <string>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
int main()
{
const string s1 = "“lips“";
const string s2 = "“lips’“";
if (regex_search(s1, regex("“[^”]*“"))) cout << "s1 matched" << endl;
if (regex_search(s2, regex("“[^”]*“"))) cout << "s2 matched" << endl;
return 0;
}
output: s1 matched
Is the symbol ’
special ? Why is the second string not matching?
boost regex library does not use utf-8 by default. utf-8 quote symbol and apostrophe have common byte, that`s why regex does not work. Code for utf-8:
#include <iostream>
#include <string>
#include <boost/regex.hpp>
#include <boost/regex/icu.hpp>
using namespace std;
using namespace boost;
int main()
{
const string s1 = "“lips“";
const string s2 = "“lips’“";
if (u32regex_search(s1, make_u32regex("“[^”]*“"))) cout << "s1 matched" << endl;
if (u32regex_search(s2, make_u32regex("“[^”]*“"))) cout << "s2 matched" << endl;
return 0;
}
compilation: g++ -std=c++11 ./test.cc -licuuc -lboost_regex
output:
s1 matched
s2 matched