I have some problems with Zalgo on my imageboard.
Texts like below mess up my imageboard. Is there a way to prevent these characters and "fix" or clean up the texts?
Example text Source:
ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST the pon̷y he comes he c̶̮omes he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e not rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
I tried to use this solution:
$cleanMessage = preg_replace("/[^\x20-\xAD\x7F]/", "", $input_lines);
Taken from here: Remove special characters that mess with formating But it works only for latin chars Can anyone help me?
This regular expression replaces every superscript symbol in the $text
variable:
$text = preg_replace("~[\p{M}]~uis","", $text);
If $text
contains char with superscript, for example กิ
this regex will remove that superscript symbol and result $text will contain just ก
.
I was improved this regex and changed it to filter only second level of phonetic marks
$text = preg_replace("~(?:[\p{M}]{1})([\p{M}])+?~uis","", $text);
This regex will filter only second level of superscript symbols. Use it if you want to filter deutch or other languages with reserved marks. This regex will transform this word -
͐̈ͩ̎Zͮ͌ͦ͆ͦͤÃ̉͛̄ͭ̈̚LͫG̉̋͂̉Oͨ͌̋͗!
into this: ZÄLͫGO!
I hope second regex will help you.