I'm having trouble with this piece of code. What it should do is take a string, split it by word, then check it against a dictionary. However when the string contains an "Umlaut" ÄäÖöÜü it splits it there.
I'm pretty sure the problem is [A-ZäöüÄÖÜ\']
it seems i'm including the special charackters wrong, but how?
$string = "Rechtschreibprüfung";
preg_match_all("/[A-ZäöüÄÖÜ\']{1,16}/i", $string, $words);
for ($i = 0; $i < count($words[0]); ++$i) {
if (!pspell_check($pspell_link, $words[0][$i])) {
$array[] = $words[0][$i];
}
}
result:
$array[0] = Rechtschreibprü"
$array[1] = "fung"
To match a chunk of Unicode letters, you can use
'/\p{L}+/u'
The \p{L}
matches any Unicode letter, +
matches one or more occurrenes of the preceding subpattern and the /u
modifier treats the pattern and string as Unicode strings.
To only match whole words, use word boundaries:
'/\b\p{L}+\b/u'
If you have diacritics, also add \p{M}
:
'/\b[\p{M}\p{L}]+\b/u'