I'm learning about Unicode basics and I came across this passage:
"The Unicode standard describes how characters are represented by code points. A code point is an integer value, usually denoted in base 16. In the standard, a code point is written using the notation U+12ca to mean the character with value 0x12ca (4810 decimal)."
I have three questions from here.
its my first post here and would appreciate any help! have a nice day y'all!!
- what does the
ca
stand for?
It stands for the hexadecimal digits c
and a
.
In some places I've seen it written as just
U+12
. What's the difference?
Either that is a mistake, or U+12
is another (IMO sloppy / ambiguous) way of writing U+0012
... which is a different Unicode codepoint to U+12ca
.
- Where did the
0
in0x12ca
come from? what does it mean?
That is a different notation. That is hexadecimal (integer) literal notation as used in various programming languages; e.g. C, C++, Java and so on. It represents a number ... not necessarily a Unicode codepoint.
The 0x
is just part of the notation. (It "comes from" the respective language specifications ...)
- How does the value
0x12ca
become4810
decimal?
The 0x
means that the remaining are hexadecimal digits (aka base 16), where:
a
or A
represents 10
,b
or B
represents 11
,c
or C
represents 12
,d
or D
represents 13
,e
or E
represents 14
,f
or F
represents 15
,So 0x12ca
is 1 x 163 + 2 x 162 + 12 x 161 + 10 x 160 ... is 4810.
(Do the arithmetic yourself to check. Converting between base 10 and base 16 is simple high-school mathematics.)