phpregexpcregrapheme

regular expression to match name initials - PCRE


I have a regular expression to get the initials of a name like below:

/\b\p{L}\./gu

it works fine with English and other languages until there are graphemes and combined charecters occur. Like
in Hindi and
in Kannada
are being matched
But,
के this one in Hindi,
ಕೆ this one in Kannada
are notmatched with this regex.
I am trying to get the initials from a name like J.P.Morgan, etc.
Any help would be greatly appreciated.


Solution

  • You need to match diacritic marks after base letters using \p{M}*:

    '~\b(?<!\p{M})\p{L}\p{M}*\.~u'
    

    The pattern matches

    See the PHP demo online:

    $s = "क. ಕ. के. ಕೆ. ";
    echo preg_replace('~\b(?<!\p{M})\p{L}\p{M}*+\.~u', '<pre>$0</pre>', $s); 
    // => <pre>क.</pre> <pre>ಕ.</pre> <pre>के.</pre> <pre>ಕೆ.</pre>