Looking at this official entities.json file, some of the entities are defined without an ending semicolon.
For example:
"Â": { "codepoints": [194], "characters": "\u00C2" },
"Â": { "codepoints": [194], "characters": "\u00C2" },
Where is that documented in HTML5? Or is that a browser thing¹?
¹ thing as in extension for backward compatibility.
Named HTML entities without a semicolon are not valid, per the HTML spec, but browsers are required to support some of them anyway. (This spec pattern - where something is officially illegal for you to do as a HTML author, but still has a single unambiguously specified behaviour that browsers must implement - is used a lot in the HTML spec.)
There are a few pertinent sections in the spec:
Pertinent quote:
Named character references
The ampersand must be followed by one of the names given in the named character references section, using the same case. The name must be one that is terminated by a U+003B SEMICOLON character (;).
§13.2 Parsing HTML Documents, especially 13.2.5.73 Named character reference state (if you really want to pick through the horrible hard-to-read implementation details of the parsing algorithm).
The non-normative §1.11.2 Syntax errors, which contains some explanation on why the spec makes references without semicolons errors (though I don't personally find it hugely compelling):
Errors involving fragile syntax constructs
There are syntax constructs that, for historical reasons, are relatively fragile. To help reduce the number of users who accidentally run into such problems, they are made non-conforming.
Example
For example, the parsing of certain named character references in attributes happens even with the closing semicolon being omitted. It is safe to include an ampersand followed by letters that do not form a named character reference, but if the letters are changed to a string that does form a named character reference, they will be interpreted as that character instead.
In this fragment, the attribute's value is
"?bill&ted"
:<a href="?bill&ted">Bill and Ted</a>
In the following fragment, however, the attribute's value is actually
"?art©"
, not the intended"?art©"
, because even without the final semicolon,"©"
is handled the same as"©"
and thus gets interpreted as"©"
:<a href="?art©">Art and Copy</a>
To avoid this problem, all named character references are required to end with a semicolon, and uses of named character references without a semicolon are flagged as errors.
Thus, the correct way to express the above cases is as follows:
<a href="?bill&ted">Bill and Ted</a> <!-- &ted is ok, since it's not a named character reference --> <a href="?art&copy">Art and Copy</a> <!-- the & has to be escaped, since © is a named character reference -->
As a final bit of corroboration that entities like Â
are invalid but work anyway, we can use this test document:
<!DOCTYPE html>
<html lang="en">
<title>Test page</title>
<div>Â</div>
</html>
Open it in Chrome, and it works and shows us an A with a circumflex accent:
But paste it into the Nu Html Checker (endorsed by WhatWG), and we get an error stating "Named character reference was not terminated by a semicolon.":
i.e. it works, but it's invalid.