Following the question How to check Unicode input value in JavaScript?, I noticed that Unicode character ranges having more than 4 character length (for example Grantha Unicode Block cannot be captured using the following code;
function checkGrantha(str) {
return str.split('').some(function(char) {
var charCode = char.charCodeAt('0')
return (
charCode >= 0x11300 && charCode <= 0x1137F
)
})
}
console.log('𑌕𑌶');
After some research I found this article where it says that ES6/ES2015 introduced a way to represent Unicode points in the astral planes (any Unicode code point requiring more than 4 chars), by wrapping the code in graph parentheses: '\u{XXXXX}', example '\u{1F436}';
But this cannot be implemented in the above provided code. Is there a way to fix this issue?
First of all, don't use the str.split('')
function, it will split the string into 16-bit code units, and this will work incorrectly for characters outside the BMP (i.e., in astral planes); use Array.from(str)
instead...
Next, for a similar reason, don't use char.charCodeAt(0)
, but char.codePointAt(0)
instead...
function checkGrantha(str)
{
return Array.from(str).some(function(char) {
var codePoint = char.codePointAt(0)
return (
codePoint >= 0x11300 && codePoint <= 0x1137F
)
})
}
function checkGrantha(str)
{
return /[\u{11300}-\u{1137F}]/u.test(str);
}
or:
function checkGrantha(str)
{
// Warning: this will miss U+1133B COMBINING BINDU BELOW whose Unicode 'Script' property is 'Inherited', not 'Grantha'...
return /\p{Script=Grantha}/u.test(str);
}
console.log (checkGrantha('𑌕𑌶')); // -> true