The current version of UTF-16 is only capable of encoding 1,112,064 different numbers(code points); 0x0-0x10FFFF
.
Does the Unicode Consortium Intend to make UTF-16 run out of characters?
i.e. make a code point > 0x10FFFF
If not, why would anyone write the code for a utf-8 parser to be able to accept 5 or 6 byte sequences? Since it would add unnecessary instructions to their function.
Isn't 1,112,064 enough, do we actually need MORE characters? I mean: How quickly are we running out?
As of 2011 we have consumed 109,449 characters AND set aside for application use (6,400+131,068):
leaving room for over 860,000 unused chars; plenty for CJK extension E (~10,000 chars) and 85 more sets just like it; so that in the event of contact with the Ferengi culture, we should be ready.
In November 2003 the IETF restricted UTF-8 to end at U+10FFFF with RFC 3629, in order to match the constraints of the UTF-16 character encoding: a UTF-8 parser should not accept 5 or 6 byte sequences that would overflow the utf-16 set, or characters in the 4 byte sequence that are greater than 0x10FFFF
Please put edits listing sets that pose threats on the size of the unicode code point limit here if they are over 1/3 the Size of the CJK extension E (~10,000 chars):