[SOLVED] How to find chinese and english character in NodeJS?

How to find chinese and english character in NodeJS?

I have a string containing Chinese and English characters and I want to split the string into the individual Chinese and English characters.

Here are some examples:

hello 你好
你好你好 hello

This page teaches how to detect Chinese character but it didn't work when splitting up the string.

Thanks in advance

Solution

You could split the string at every occurrence of space at every occurrence of a 'Chinese' character, as so:

  let chiStr = "你好 你好 hello"
  chiStr.split(' ')//splitting the string at every occurrence of a space
  //expected result: ["你好", "你好", "hello"]

  const REGEX_CHINESE = /[\u4e00-\u9fff]|[\u3400-\u4dbf]|[\u{20000}-\u{2a6df}]|[\u{2a700}-\u{2b73f}]|[\u{2b740}-\u{2b81f}]|[\u{2b820}-\u{2ceaf}]|[\uf900-\ufaff]|[\u3300-\u33ff]|[\ufe30-\ufe4f]|[\uf900-\ufaff]|[\u{2f800}-\u{2fa1f}]/u;
  const hasJapanese = (str) => REGEX_CHINESE.test(str);

  chiStr.split(REGEX_CHINESE) splitting the string at every occurrence of a 'chinese' character
  //expected result: ["你", "好", "你", "好", " hello"]

Another good approach is to filter out the Chinese words and the English words into separate arrays as so:

const REGEX_CHINESE = /[\u4e00-\u9fff]|[\u3400-\u4dbf]|[\u{20000}-\u{2a6df}]|[\u{2a700}-\u{2b73f}]|[\u{2b740}-\u{2b81f}]|[\u{2b820}-\u{2ceaf}]|[\uf900-\ufaff]|[\u3300-\u33ff]|[\ufe30-\ufe4f]|[\uf900-\ufaff]|[\u{2f800}-\u{2fa1f}]/u;
const hasJapanese = (str) => REGEX_CHINESE.test(str);

const seperateWords = (str)=>{
   let newStr = str.split(' ')
   let chiWords = newStr.filter((string)=>REGEX_CHINESE.test(string))//All chinnese words
   let engWords = newStr.filter((string)=>!REGEX_CHINESE.test(string)) //All english words
   let arrayOfDiffWords = [chiWords, engWords]
   return arrayOfDiffWords
}
console.log(seperateWords("你好 你好 hello")) //test