javascripthtmlcharacter-encodinghtml-entitieshtml-encode

Browser encoding of double quotes


I am learning about character encoding but I am slightly confused now.

I observe the following behavior both in Chrome and Firefox.

The following HTML code snippets print it works as soon as the page is loaded:

<button onclick="myFunction("hello&quot;)&quot;></button><script>alert('it works')</script><!-- &quot;)">click2</button>

However, this one does not print anything on page load and when the button is clicked it raises the following error: SyntaxError: Unexpected end of input:

<button onclick="myFunction(&quot;hello&quot;)&quot;></button><script>alert('it does not work')</script><!-- &quot;)">click1</button>

I do not understand why the two code snippets exhibit a different behavior even if they only differ for the myFunction input string.

Specifically, in the first snippet, the inital quote is not encoded ("hello&quot;), while in the second snippet, it is encoded (&quot;hello&quot;)

Could you explain it, please? Thank you.


Solution

  • HTML entities (e.g. &quot;) have two purposes:

    In your examples, double quotes " have a special meaning: the signal the start and end of attribute values. If you want to type literal double quotes in a place where there's ambiguity, you need to use the &quote; entity.

    I'll a couple of examples to further clarify:

    em {
        color: green;
    }
    <section>
      <p>This paragraph content contains literal less than characters, so you will those characters upon rendering: &lt;em>Hello, World&lt;/em>.
    </section>
    
    <section>
      <p>This paragraph content contains less than characters that are part of the code, so you will get an HTML tag upon rendering: <em>Hello, World</em>.
    </section>

    <section>
      This input field value attribute is surrounded by double quote characters (located before and after <em>One </em>) and contains literal double quote characters, so you'll see them when rendered:<br>
      <input type="text" value="One &quot;Two&quot; Three">
    </section>
    
    <section>
      This input field value attribute is also surrounded by double quote characters (before <em>One</em> and after <em>Three</em>), so you see all three words when rendered. Everything else inside the tag is ignored by the browser:<br>
      <input type="text" value="One "Two" Three">
    </section>