I'm using this PHP function to wrap emojis in arbitrary HTML tags, which allows me to style them on web pages, since CSS3 does not (yet?) directly support styling of multi-byte characters, at least I haven't found any CSS selector for that purpose:
function wrap_emojis($s, $str_before, $str_after) {
$default_encoding = mb_regex_encoding();
mb_regex_encoding('UTF-8');
$s = mb_ereg_replace('([^\x{0000}-\x{FFFF}])', $str_before . '\\1' . $str_after, $s);
mb_regex_encoding($default_encoding);
return $s;
}
The issue is that it works for lower range emojis such as 😎 (01F60E) but it does not work for higher range emojis such as ☀️ (2600FE0F)
Any ideas how to fix the PHP function so that it works with 4 bytes range as well?
e.g. if I call wrap_emojis("zzz☀️zzz", "A", "B"); Expected result: "zzzA☀️Bzzz". Actual result: "zzz☀️zzz". But it works with lower range emojis as noted in the question, e.g. wrap_emojis("zzz😎zzz", "A", "B") returns: "zzzA😎Bzzz"
Alright, so it wasn't that hard, I just had to write the RegEx matching 2 groups of 2 bytes (mb4 with "variation selector") OR (when none is found) then any character not in lower 2 bytes range. Pretty sure it will cause issues in foreign languages, but in English, it works great!
$s = mb_ereg_replace('([\x{0100}-\x{FFFF}][\x{0000}-\x{FFFF}]|[^\x{0000}-\x{FFFF}])', $str_before . '\\1' . $str_after, $s);
Hope it enlightens other people on here. Cheers 🤣