Let's take COMBINING ACUTE ACCENT, for example. Its browser test page does include it alone in the page, but it reacts in a strange way: I can't select it with my mouse, and if I try to interact with it in the DOM inspector, it feels like it's not part of the text at all (there's no before and after this character):
Is a combining character, used alone, still a valid Unicode string?
Or does it have to follow another character?
Yes, a combining character alone is a valid Unicode string (even though its behaviour may be weird without a base character). Section 2.11 of the Unicode Standard emphasises this:
In the Unicode Standard, all sequences of character codes are permitted.
The presentation of such strings is described in D52:
There may be no such base character, such as when a combining character is at the start of text or follows a control or format character [...] In such cases, the combining characters are called isolated combining characters.
With isolated combining characters or when a process is unable to perform graphical combination, a process may present a combining character without graphical combination; that is, it may present it as if it were a base character.
However, if you want to display a combining character by itself, it is recommended that you attach it to a no-break space base character:
Nonspacing combining marks used by the Unicode Standard may be exhibited in apparent isolation by applying them to
U+00A0 NO-BREAK SPACE
. This convention might be employed, for example, when talking about the combining mark itself as a mark, rather than using it in its normal way in text (that is, applied as an accent to a base letter or in other combinations).In charts and illustrations in this standard, the combining nature of these marks is illustrated by applying them to a dotted circle, as shown in the examples throughout this standard.
Many common European diacritical marks are also encoded by the Unicode standard separately, as independent characters for compatibility with existing character sets (e.g. U+02D8 BREVE
for U+0306 COMBINING BREVE
). These characters will usually decompose as a U+0020 SPACE
plus the combining diacritical mark (see Section 7.9).