javascriptregexstringregular-languagezalgo

Regex to Detect Zalgo


I'm creating a message filtering system, that detects z͎͗ͣḁ̵̑l̉̃ͦg̐̓̒o͓̔ͥ.

My current regex is /([^\u0009-\u02b7\u2000-\u20bf\u2122\u0308]|(?![^aeiouy])\u0308)/gm but this also captures emojis.

The regex should filter all w̵̢̃ë̸̩́ị̵̽r̴̺̆d̴̘̕ ̴͎́ẗ̷͕́e̷̳̅x̷̮́ṱ̸̏ ̸̜͒ḻ̵̎i̶̧͐k̸̗̈ě̸͖ ̸̥̄t̶̛̤h̸̰̔i̵̿͜ş̴̛ or t̶e̶x̴t̸ ̸l̵i̶k̷e̸ ̵t̷h̵i̷s̴, but should not capture emojis. 🤔


Solution

  • Here is how to test

    const re = /%CC%/g
    const hasZalgo = txt => re.test(encodeURIComponent(txt));  
    
    console.log(hasZalgo("w̵̢̃ë̸̩́ị̵̽r̴̺̆d̴̘̕ ̴͎́ẗ̷͕́e̷̳̅x̷̮́ṱ̸̏ ̸̜͒ḻ̵̎i̶̧͐k̸̗̈ě̸͖ ̸̥̄t̶̛̤h̸̰̔i̵̿͜ş̴̛ 222 🤔"))
    console.log(hasZalgo("Weird text like %CC% this 🤔"))

    Here is how to convert

    console.log(
      decodeURIComponent(
        encodeURIComponent("w̵̢̃ë̸̩́ị̵̽r̴̺̆d̴̘̕ ̴͎́ẗ̷͕́e̷̳̅x̷̮́ṱ̸̏ ̸̜͒ḻ̵̎i̶̧͐k̸̗̈ě̸͖ ̸̥̄t̶̛̤h̸̰̔i̵̿͜ş̴̛ 222 🤔")
        .replace(/%CC(%[A-Z0-9]{2})+%20/g," ") // replace space
        .replace(/%CC(%[A-Z0-9]{2})+(\w)/g,"$2") // replace anything else
      )
    )