I am learning about character encoding but I am slightly confused now.
I observe the following behavior both in Chrome and Firefox.
The following HTML code snippets print it works
as soon as the page is loaded:
<button onclick="myFunction("hello")"></button><script>alert('it works')</script><!-- ")">click2</button>
However, this one does not print anything on page load and when the button is clicked it raises the following error: SyntaxError: Unexpected end of input
:
<button onclick="myFunction("hello")"></button><script>alert('it does not work')</script><!-- ")">click1</button>
I do not understand why the two code snippets exhibit a different behavior even if they only differ for the myFunction
input string.
Specifically, in the first snippet, the inital quote is not encoded ("hello"
), while in the second snippet, it is encoded ("hello"
)
Could you explain it, please? Thank you.
HTML entities (e.g. "
) have two purposes:
Allow to express arbitrary Unicode characters using a small subset of characters compatible with any encoding. This allows to insert any character in an HTML document, no matter its encoding, and can make it easier to type characters that are not available in your keyboard (for example, you can type ©
to get ©).
Allow to insert literal instances of characters that have a special meaning in HTML, such as <
(for example, <p>you can use <strong> to emphasise text</p>
).
In your examples, double quotes "
have a special meaning: the signal the start and end of attribute values. If you want to type literal double quotes in a place where there's ambiguity, you need to use the "e;
entity.
I'll a couple of examples to further clarify:
em {
color: green;
}
<section>
<p>This paragraph content contains literal less than characters, so you will those characters upon rendering: <em>Hello, World</em>.
</section>
<section>
<p>This paragraph content contains less than characters that are part of the code, so you will get an HTML tag upon rendering: <em>Hello, World</em>.
</section>
<section>
This input field value attribute is surrounded by double quote characters (located before and after <em>One </em>) and contains literal double quote characters, so you'll see them when rendered:<br>
<input type="text" value="One "Two" Three">
</section>
<section>
This input field value attribute is also surrounded by double quote characters (before <em>One</em> and after <em>Three</em>), so you see all three words when rendered. Everything else inside the tag is ignored by the browser:<br>
<input type="text" value="One "Two" Three">
</section>