I'm quite terrible at regexes.
I have a string that may have 1 or more words in it (generally 2 or 3), usually a person name, for example:
$str1 = 'John Smith';
$str2 = 'John Doe';
$str3 = 'David X. Cohen';
$str4 = 'Kim Jong Un';
$str5 = 'Bob';
I'd like to convert each as follows:
$str1 = 'John S.';
$str2 = 'John D.';
$str3 = 'David X. C.';
$str4 = 'Kim J. U.';
$str5 = 'Bob';
My guess is that I should first match the first word, like so:
preg_match( "^([\w\-]+)", $str1, $first_word )
then all the words after the first one... but how do I match those? should I use again preg_match and use offset = 1 in the arguments? but that offset is in characters or bytes right?
Anyway after I matched the words following the first, if the exist, should I do for each of them something like:
$second_word = substr( $following_word, 1 ) . '. ';
Or my approach is completely wrong?
Thanks
ps - it would be a boon if the regex could maintain the whole first two words when the string contain three or more words... (e.g. 'Kim Jong U.').
It can be done in single preg_replace
using a regex.
You can search using this regex:
^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+
And replace by:
$1.
Code:
$name = preg_replace('/^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+/', '$1.', $name);
Explanation:
(*FAIL)
behaves like a failing negative assertion and is a synonym for (?!)
(*SKIP)
defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later(*SKIP)(*FAIL)
together provide a nice alternative of restriction that you cannot have a variable length lookbehind in above regex.^\w+(?:$| +)(*SKIP)(*F)
matches first word in a name and skips it (does nothing)(\w)\w+
matches all other words and replaces it with first letter and a dot.