I need a bit of help in PHP.
I have two Hebrew words which are perfectly the same from the point of view of lexical meaning, but they do not match in bit wise comparison.
1. version: הִפַּלְנוּ
2. version: הִפַּלְנוּ
31 3a 20 d794 d6b4 d7a4 **d6bc d6b7** d79c d6b0 d7a0 d795 d6bc
32 3a 20 d794 d6b4 d7a4 **d6b7 d6bc** d79c d6b0 d7a0 d795 d6bc
The problem is in diacritics, called nekudot.
As I know that is something related to terms as Grapheme Cluster Boundaries. But my PHP server (and provider) is not running these new functions. So I am interested how it was solved before PHP 6. Or is there any norm saying what is the right order of nekudot in Hebrew words?
So my main questions:
How to match two Hebrew words as lexically identical in PHP 5 which differ only in the order of nekudot?
What is the right/canonical/lexical order of nekudot in a Hebrew word?
I know it is quite specific question I would not ask if I could find the solution by myself.
Thank you in advance. :-)
Apply the Normalizer
class. There is a large collection of compatibility precomposed characters for Hebrew involving dagesh and/or other combining marks.
<?php
$dageshpatah = "\u{05D4}\u{05B4}\u{05E4}\u{05BC}\u{05B7}\u{05DC}\u{05B0}\u{05E0}\u{05D5}\u{05BC}"; // 'הִפַּלְנוּ';
$patahdagesh = "\u{05D4}\u{05B4}\u{05E4}\u{05B7}\u{05BC}\u{05DC}\u{05B0}\u{05E0}\u{05D5}\u{05BC}"; // 'הִפַּלְנוּ';
echo implode(':', [
'dageshpatah',
$dageshpatah,
Normalizer::isNormalized( $dageshpatah, Normalizer::FORM_C ) ? 'normalized' : 'unnormalized'
]) . PHP_EOL;
echo implode(':', [
'patahdagesh',
$patahdagesh,
Normalizer::isNormalized( $patahdagesh, Normalizer::FORM_C ) ? 'normalized' : 'unnormalized'
]) . PHP_EOL;
echo '== raw values: ' . ($dageshpatah == $patahdagesh ? 'true' : 'false') . PHP_EOL;
echo '== normalized values: ' . (
Normalizer::normalize( $dageshpatah, Normalizer::FORM_C ) ==
Normalizer::normalize( $patahdagesh, Normalizer::FORM_C ) ? 'true' : 'false'
) . PHP_EOL;
?>
Result: .\SO\79266396.php
dageshpatah:הִפַּלְנוּ:unnormalized patahdagesh:הִפַּלְנוּ:normalized == raw values: false == normalized values: true