As far as I see, surrogate pair is two 16-bit codepoints for 1 character. Surrogate pair uses for "big" codepoints, which can't be written in 16 bits.
So, my question is... Can this consider a surrogate pair or it's just combination of different characters in one string?
let str = '\u0057\u0303';
console.log(str);
Basically it's one character that consits of two codepoints. But also we can combine more than 2 codepoints in one character the same way. For example:
console.log('\u0053\u0307\u0323');
So, is that a surrogate pair? If no, how does a surrogate pair look like?
A surrogate pair in UTF-16 consists of two 16-bit CODEUNITS, not CODEPOINTS. Just as UTF-8 uses 1..4 8-bit CODEUNITS per CODEPOINT, UTF-32 uses 1 32-bit CODEUNIT per CODEPOINT, etc.
CODEPOINTS and CODEUNITS are not the same thing, so don't get them confused. CODEUNITS are the actual numbers Unicode assigns to each symbol. CODEUNITS are used to represent CODEPOINTS in specific UTF encodings.
And no, your example is not a surrogate pair. The high char of a surrogate pair is always in the range \uD800..\uDBFF
, and the low char is always in the range \uDC00..\uDFFF
. Your example is not. For why this requirement exists, I suggest you read up on what UTF-16 actually is and how it works.
Multiple CODEPOINTS can be combined together to create GRAPHEMES (what you consider CHARACTERS), but not all CODEPOINTS can be combined, only certain CODEPOINTS that have COMBINING characteristics can be combined with other CODEPOINTS, and only in certain ways that Unicode defines. So you can't just combine CODEPOINTS any which way you want.