I'm translating user-submitted strings from UTF-8 to ASCII-Printable:
$str = 'Thê qúïck 😈 brõwn fõx júmps?😈 Óvér thé lázy dõg?😈';
$out = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
var_dump($out);
$out = 'The quick ? brown fox jumps?? Over the lazy dog??';
I want the extra ?
question marks from $out
removed.
if ($out !== $str && strpos($out, '?') !== false) {
// The input string was modified and contains at least one question mark
//
// Not even really sure where to begin
//
// Do we need to compare the position of every character from the
// original string to every position of the new string and replace
// where the original string did not contain a question mark?
//
// That's all I can think of, but there has to be a better way.
}
I want to keep all //TRANSLIT
characters, including those few included in the example $str
above, e.g.áéïõú
= aeiou
. There is no other nuace to this question. I think it boils down to a string comparison and replace question.
I'm not necessarily looking for someone to write the entire code, just a pointer in the right direction of how you'd tackle this.
Here is a solution based on transliterator_transliterate()
:
$str = transliterator_transliterate('Latin-ASCII', 'Thê qúïck 😈 brõwn fõx júmps?😈 Óvér thé lázy dõg?😈');
$str = preg_replace('/[\x80-\xFF]/', '', $str);
echo $str;
Output:
The quick brown fox jumps? Over the lazy dog?
Note that the emoji are kept by transliterator_transliterate()
, so I used a regex to remove all the remaining non-ASCII characters.