I have a string containing Chinese and English characters and I want to split the string into the individual Chinese and English characters.
Here are some examples:
This page teaches how to detect Chinese character but it didn't work when splitting up the string.
Thanks in advance
You could split the string at every occurrence of space at every occurrence of a 'Chinese' character, as so:
let chiStr = "你好 你好 hello"
chiStr.split(' ')//splitting the string at every occurrence of a space
//expected result: ["你好", "你好", "hello"]
const REGEX_CHINESE = /[\u4e00-\u9fff]|[\u3400-\u4dbf]|[\u{20000}-\u{2a6df}]|[\u{2a700}-\u{2b73f}]|[\u{2b740}-\u{2b81f}]|[\u{2b820}-\u{2ceaf}]|[\uf900-\ufaff]|[\u3300-\u33ff]|[\ufe30-\ufe4f]|[\uf900-\ufaff]|[\u{2f800}-\u{2fa1f}]/u;
const hasJapanese = (str) => REGEX_CHINESE.test(str);
chiStr.split(REGEX_CHINESE) splitting the string at every occurrence of a 'chinese' character
//expected result: ["你", "好", "你", "好", " hello"]
Another good approach is to filter out the Chinese words and the English words into separate arrays as so:
const REGEX_CHINESE = /[\u4e00-\u9fff]|[\u3400-\u4dbf]|[\u{20000}-\u{2a6df}]|[\u{2a700}-\u{2b73f}]|[\u{2b740}-\u{2b81f}]|[\u{2b820}-\u{2ceaf}]|[\uf900-\ufaff]|[\u3300-\u33ff]|[\ufe30-\ufe4f]|[\uf900-\ufaff]|[\u{2f800}-\u{2fa1f}]/u;
const hasJapanese = (str) => REGEX_CHINESE.test(str);
const seperateWords = (str)=>{
let newStr = str.split(' ')
let chiWords = newStr.filter((string)=>REGEX_CHINESE.test(string))//All chinnese words
let engWords = newStr.filter((string)=>!REGEX_CHINESE.test(string)) //All english words
let arrayOfDiffWords = [chiWords, engWords]
return arrayOfDiffWords
}
console.log(seperateWords("你好 你好 hello")) //test