unicodesymbolspunctuationcharacter-properties

What's the difference between GC=Mark and GC=Punctuation in Unicode general categories?


I'm having trouble understanding some concepts. In the Unicode spec, there's a property called general category.

OK I understood what are each of letters (usual characters; GC=L), numbers (like digits 0–9 and other characters that have numeric values; GC=N) and separators (dividers; GC=Z). But it's really hard to distinguish between symbols (GC=S), punctuation (GC=P), and marks (GC=M).

I looked up a list of them, but I couldn't find conceptual difference. And the document doesn't help me a lot. What's the difference between all these?


Solution

  • Marks aren't standalone characters, but are applied to another character. Non-spacing marks are displayed over the target character, spacing marks are displayed attached to the target character and enclosing marks are displayed surrounding the target character. For example here's an a in a box (the character "a" combined with the enclosing square character): a⃞

    Regarding punctuations versus symbols: As the text you linked explains, some edge cases are classified rather arbitrarily, but in principle the difference is that punctuation is used "to organize and delimit textual units" (i.e. to mark the end of a sentence, separate different parts of a sentence, separate the elements of an enumeration etc.) and symbols "to represent concepts" (like units for example or mathematical notations).